from:"Keshavamurthy, Anil S"

Re: [PATCH] intel-iommu: Fix array overflow

2007-10-25 Thread Keshavamurthy, Anil S

On Thu, Oct 25, 2007 at 09:31:02AM +0200, Takashi Iwai wrote:
> At Wed, 24 Oct 2007 16:30:37 -0700,
> Mark Gross wrote:
> > 
> > On Tue, Oct 23, 2007 at 10:57:51AM +0200, Takashi Iwai wrote:
> > > Fix possible array overflow:
> > > 
> > > drivers/pci/intel-iommu.c: In function ‘dmar_get_fault_reason’:
> > > drivers/pci/intel-iommu.c:753: warning: array subscript is above array 
> > > bounds
> > > drivers/pci/intel-iommu.c: In function ‘iommu_page_fault’:
> > > drivers/pci/intel-iommu.c:753: warning: array subscript is above array 
> > > bounds
> > > 
> > > Signed-off-by: Takashi Iwai <[EMAIL PROTECTED]>
> > > 
> > > ---
> > >  drivers/pci/intel-iommu.c |4 ++--
> > >  1 file changed, 2 insertions(+), 2 deletions(-)
> > > 
> > > diff --git a/drivers/pci/intel-iommu.c b/drivers/pci/intel-iommu.c
> > > index b3d7031..e4b0a0d 100644
> > > --- a/drivers/pci/intel-iommu.c
> > > +++ b/drivers/pci/intel-iommu.c
> > > @@ -749,8 +749,8 @@ static char *fault_reason_strings[] =
> > >  
> > >  char *dmar_get_fault_reason(u8 fault_reason)
> > >  {
> > > - if (fault_reason > MAX_FAULT_REASON_IDX)
> > > - return fault_reason_strings[MAX_FAULT_REASON_IDX];
> > > + if (fault_reason >= MAX_FAULT_REASON_IDX)
> > > + return fault_reason_strings[MAX_FAULT_REASON_IDX - 1];
> > 
> > This looks like what the code meant to implement.
> 
> I think not.  The size of fault_reason_strings[] is
> MAX_FAULT_REASON_IDX, not + 1.  So gcc warning is correct.
> Maybe the main problem is that the constant name is confusing...
Yup, GCC warning is correct. the size of fault_reason_strings[]
is MAX_FAULT_REASON_IDX and hence the max array that can be 
referenced is [MAX_FAULT_REASON_IDX -1], hence the fix by
Takashi is correct.

-Anil
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] intel-iommu: Fix array overflow

2007-10-25 Thread Keshavamurthy, Anil S

On Thu, Oct 25, 2007 at 09:31:02AM +0200, Takashi Iwai wrote:
 At Wed, 24 Oct 2007 16:30:37 -0700,
 Mark Gross wrote:
  
  On Tue, Oct 23, 2007 at 10:57:51AM +0200, Takashi Iwai wrote:
   Fix possible array overflow:
   
   drivers/pci/intel-iommu.c: In function ‘dmar_get_fault_reason’:
   drivers/pci/intel-iommu.c:753: warning: array subscript is above array 
   bounds
   drivers/pci/intel-iommu.c: In function ‘iommu_page_fault’:
   drivers/pci/intel-iommu.c:753: warning: array subscript is above array 
   bounds
   
   Signed-off-by: Takashi Iwai [EMAIL PROTECTED]
   
   ---
drivers/pci/intel-iommu.c |4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
   
   diff --git a/drivers/pci/intel-iommu.c b/drivers/pci/intel-iommu.c
   index b3d7031..e4b0a0d 100644
   --- a/drivers/pci/intel-iommu.c
   +++ b/drivers/pci/intel-iommu.c
   @@ -749,8 +749,8 @@ static char *fault_reason_strings[] =

char *dmar_get_fault_reason(u8 fault_reason)
{
   - if (fault_reason  MAX_FAULT_REASON_IDX)
   - return fault_reason_strings[MAX_FAULT_REASON_IDX];
   + if (fault_reason = MAX_FAULT_REASON_IDX)
   + return fault_reason_strings[MAX_FAULT_REASON_IDX - 1];
  
  This looks like what the code meant to implement.
 
 I think not.  The size of fault_reason_strings[] is
 MAX_FAULT_REASON_IDX, not + 1.  So gcc warning is correct.
 Maybe the main problem is that the constant name is confusing...
Yup, GCC warning is correct. the size of fault_reason_strings[]
is MAX_FAULT_REASON_IDX and hence the max array that can be 
referenced is [MAX_FAULT_REASON_IDX -1], hence the fix by
Takashi is correct.

-Anil
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH]: Genericizing iova.[ch]

2007-10-23 Thread Keshavamurthy, Anil S

On Tue, Oct 23, 2007 at 01:01:56AM -0700, David Miller wrote:
> 
> I would like to potentially move the sparc64 IOMMU code over to using
> the nice new drivers/pci/iova.[ch] code for free area management..
Welcome!! 
> 
> In order to do that we have to detach the IOMMU page size assumptions
> which only really need to exist in the intel-iommu.[ch] code.
> 
> This patch attempts to implement that.
Your patch looks good to me.

Andrew,
Please queue this patch in your mm tree.
> 
> Signed-off-by: David S. Miller <[EMAIL PROTECTED]>
Signed-off-by: Anil S Keshavamurthy <[EMAIL PROTECTED]>

> 
> diff --git a/drivers/pci/intel-iommu.c b/drivers/pci/intel-iommu.c
> index b3d7031..7f2aec4 100644
> --- a/drivers/pci/intel-iommu.c
> +++ b/drivers/pci/intel-iommu.c
> @@ -1089,7 +1089,7 @@ static void dmar_init_reserved_ranges(void)
>   int i;
>   u64 addr, size;
>  
> - init_iova_domain(_iova_list);
> + init_iova_domain(_iova_list, DMA_32BIT_PFN);
>  
>   /* IOAPIC ranges shouldn't be accessed by DMA */
>   iova = reserve_iova(_iova_list, IOVA_PFN(IOAPIC_RANGE_START),
> @@ -1143,7 +1143,7 @@ static int domain_init(struct dmar_domain *domain, int 
> guest_width)
>   int adjust_width, agaw;
>   unsigned long sagaw;
>  
> - init_iova_domain(>iovad);
> + init_iova_domain(>iovad, DMA_32BIT_PFN);
>   spin_lock_init(>mapping_lock);
>  
>   domain_reserve_special_ranges(domain);
> diff --git a/drivers/pci/intel-iommu.h b/drivers/pci/intel-iommu.h
> index ee88dd2..af8560c 100644
> --- a/drivers/pci/intel-iommu.h
> +++ b/drivers/pci/intel-iommu.h
> @@ -27,6 +27,19 @@
>  #include 
>  
>  /*
> + * We need a fixed PAGE_SIZE of 4K irrespective of
> + * arch PAGE_SIZE for IOMMU page tables.
> + */
> +#define PAGE_SHIFT_4K(12)
> +#define PAGE_SIZE_4K (1UL << PAGE_SHIFT_4K)
> +#define PAGE_MASK_4K (((u64)-1) << PAGE_SHIFT_4K)
> +#define PAGE_ALIGN_4K(addr)  (((addr) + PAGE_SIZE_4K - 1) & PAGE_MASK_4K)
> +
> +#define IOVA_PFN(addr)   ((addr) >> PAGE_SHIFT_4K)
> +#define DMA_32BIT_PFNIOVA_PFN(DMA_32BIT_MASK)
> +#define DMA_64BIT_PFNIOVA_PFN(DMA_64BIT_MASK)
> +
> +/*
>   * Intel IOMMU register specification per version 1.0 public spec.
>   */
>  
> diff --git a/drivers/pci/iova.c b/drivers/pci/iova.c
> index a84571c..8de7ab6 100644
> --- a/drivers/pci/iova.c
> +++ b/drivers/pci/iova.c
> @@ -9,19 +9,19 @@
>  #include "iova.h"
>  
>  void
> -init_iova_domain(struct iova_domain *iovad)
> +init_iova_domain(struct iova_domain *iovad, unsigned long pfn_32bit)
>  {
>   spin_lock_init(>iova_alloc_lock);
>   spin_lock_init(>iova_rbtree_lock);
>   iovad->rbroot = RB_ROOT;
>   iovad->cached32_node = NULL;
> -
> + iovad->dma_32bit_pfn = pfn_32bit;
>  }
>  
>  static struct rb_node *
>  __get_cached_rbnode(struct iova_domain *iovad, unsigned long *limit_pfn)
>  {
> - if ((*limit_pfn != DMA_32BIT_PFN) ||
> + if ((*limit_pfn != iovad->dma_32bit_pfn) ||
>   (iovad->cached32_node == NULL))
>   return rb_last(>rbroot);
>   else {
> @@ -37,7 +37,7 @@ static void
>  __cached_rbnode_insert_update(struct iova_domain *iovad,
>   unsigned long limit_pfn, struct iova *new)
>  {
> - if (limit_pfn != DMA_32BIT_PFN)
> + if (limit_pfn != iovad->dma_32bit_pfn)
>   return;
>   iovad->cached32_node = >node;
>  }
> diff --git a/drivers/pci/iova.h b/drivers/pci/iova.h
> index ae3028d..d521b5b 100644
> --- a/drivers/pci/iova.h
> +++ b/drivers/pci/iova.h
> @@ -15,22 +15,9 @@
>  #include 
>  #include 
>  
> -/*
> - * We need a fixed PAGE_SIZE of 4K irrespective of
> - * arch PAGE_SIZE for IOMMU page tables.
> - */
> -#define PAGE_SHIFT_4K(12)
> -#define PAGE_SIZE_4K (1UL << PAGE_SHIFT_4K)
> -#define PAGE_MASK_4K (((u64)-1) << PAGE_SHIFT_4K)
> -#define PAGE_ALIGN_4K(addr)  (((addr) + PAGE_SIZE_4K - 1) & PAGE_MASK_4K)
> -
>  /* IO virtual address start page frame number */
>  #define IOVA_START_PFN   (1)
>  
> -#define IOVA_PFN(addr)   ((addr) >> PAGE_SHIFT_4K)
> -#define DMA_32BIT_PFNIOVA_PFN(DMA_32BIT_MASK)
> -#define DMA_64BIT_PFNIOVA_PFN(DMA_64BIT_MASK)
> -
>  /* iova structure */
>  struct iova {
>   struct rb_node  node;
> @@ -44,6 +31,7 @@ struct iova_domain {
>   spinlock_t  iova_rbtree_lock; /* Lock to protect update of rbtree */
>   struct rb_root  rbroot; /* iova domain rbtree root */
>   struct rb_node  *cached32_node; /* Save last alloced node */
> + unsigned long   dma_32bit_pfn;
>  };
>  
>  struct iova *alloc_iova_mem(void);
> @@ -56,7 +44,7 @@ struct iova *alloc_iova(struct iova_domain *iovad, unsigned 
> long size,
>  struct iova *reserve_iova(struct iova_domain *iovad, unsigned long pfn_lo,
>   unsigned long pfn_hi);
>  void copy_reserved_iova(struct iova_domain *from, struct iova_domain *to);
> -void init_iova_domain(struct iova_domain

Re: [PATCH]: Genericizing iova.[ch]

2007-10-23 Thread Keshavamurthy, Anil S

On Tue, Oct 23, 2007 at 01:01:56AM -0700, David Miller wrote:
 
 I would like to potentially move the sparc64 IOMMU code over to using
 the nice new drivers/pci/iova.[ch] code for free area management..
Welcome!! 
 
 In order to do that we have to detach the IOMMU page size assumptions
 which only really need to exist in the intel-iommu.[ch] code.
 
 This patch attempts to implement that.
Your patch looks good to me.

Andrew,
Please queue this patch in your mm tree.
 
 Signed-off-by: David S. Miller [EMAIL PROTECTED]
Signed-off-by: Anil S Keshavamurthy [EMAIL PROTECTED]

 
 diff --git a/drivers/pci/intel-iommu.c b/drivers/pci/intel-iommu.c
 index b3d7031..7f2aec4 100644
 --- a/drivers/pci/intel-iommu.c
 +++ b/drivers/pci/intel-iommu.c
 @@ -1089,7 +1089,7 @@ static void dmar_init_reserved_ranges(void)
   int i;
   u64 addr, size;
  
 - init_iova_domain(reserved_iova_list);
 + init_iova_domain(reserved_iova_list, DMA_32BIT_PFN);
  
   /* IOAPIC ranges shouldn't be accessed by DMA */
   iova = reserve_iova(reserved_iova_list, IOVA_PFN(IOAPIC_RANGE_START),
 @@ -1143,7 +1143,7 @@ static int domain_init(struct dmar_domain *domain, int 
 guest_width)
   int adjust_width, agaw;
   unsigned long sagaw;
  
 - init_iova_domain(domain-iovad);
 + init_iova_domain(domain-iovad, DMA_32BIT_PFN);
   spin_lock_init(domain-mapping_lock);
  
   domain_reserve_special_ranges(domain);
 diff --git a/drivers/pci/intel-iommu.h b/drivers/pci/intel-iommu.h
 index ee88dd2..af8560c 100644
 --- a/drivers/pci/intel-iommu.h
 +++ b/drivers/pci/intel-iommu.h
 @@ -27,6 +27,19 @@
  #include linux/io.h
  
  /*
 + * We need a fixed PAGE_SIZE of 4K irrespective of
 + * arch PAGE_SIZE for IOMMU page tables.
 + */
 +#define PAGE_SHIFT_4K(12)
 +#define PAGE_SIZE_4K (1UL  PAGE_SHIFT_4K)
 +#define PAGE_MASK_4K (((u64)-1)  PAGE_SHIFT_4K)
 +#define PAGE_ALIGN_4K(addr)  (((addr) + PAGE_SIZE_4K - 1)  PAGE_MASK_4K)
 +
 +#define IOVA_PFN(addr)   ((addr)  PAGE_SHIFT_4K)
 +#define DMA_32BIT_PFNIOVA_PFN(DMA_32BIT_MASK)
 +#define DMA_64BIT_PFNIOVA_PFN(DMA_64BIT_MASK)
 +
 +/*
   * Intel IOMMU register specification per version 1.0 public spec.
   */
  
 diff --git a/drivers/pci/iova.c b/drivers/pci/iova.c
 index a84571c..8de7ab6 100644
 --- a/drivers/pci/iova.c
 +++ b/drivers/pci/iova.c
 @@ -9,19 +9,19 @@
  #include iova.h
  
  void
 -init_iova_domain(struct iova_domain *iovad)
 +init_iova_domain(struct iova_domain *iovad, unsigned long pfn_32bit)
  {
   spin_lock_init(iovad-iova_alloc_lock);
   spin_lock_init(iovad-iova_rbtree_lock);
   iovad-rbroot = RB_ROOT;
   iovad-cached32_node = NULL;
 -
 + iovad-dma_32bit_pfn = pfn_32bit;
  }
  
  static struct rb_node *
  __get_cached_rbnode(struct iova_domain *iovad, unsigned long *limit_pfn)
  {
 - if ((*limit_pfn != DMA_32BIT_PFN) ||
 + if ((*limit_pfn != iovad-dma_32bit_pfn) ||
   (iovad-cached32_node == NULL))
   return rb_last(iovad-rbroot);
   else {
 @@ -37,7 +37,7 @@ static void
  __cached_rbnode_insert_update(struct iova_domain *iovad,
   unsigned long limit_pfn, struct iova *new)
  {
 - if (limit_pfn != DMA_32BIT_PFN)
 + if (limit_pfn != iovad-dma_32bit_pfn)
   return;
   iovad-cached32_node = new-node;
  }
 diff --git a/drivers/pci/iova.h b/drivers/pci/iova.h
 index ae3028d..d521b5b 100644
 --- a/drivers/pci/iova.h
 +++ b/drivers/pci/iova.h
 @@ -15,22 +15,9 @@
  #include linux/rbtree.h
  #include linux/dma-mapping.h
  
 -/*
 - * We need a fixed PAGE_SIZE of 4K irrespective of
 - * arch PAGE_SIZE for IOMMU page tables.
 - */
 -#define PAGE_SHIFT_4K(12)
 -#define PAGE_SIZE_4K (1UL  PAGE_SHIFT_4K)
 -#define PAGE_MASK_4K (((u64)-1)  PAGE_SHIFT_4K)
 -#define PAGE_ALIGN_4K(addr)  (((addr) + PAGE_SIZE_4K - 1)  PAGE_MASK_4K)
 -
  /* IO virtual address start page frame number */
  #define IOVA_START_PFN   (1)
  
 -#define IOVA_PFN(addr)   ((addr)  PAGE_SHIFT_4K)
 -#define DMA_32BIT_PFNIOVA_PFN(DMA_32BIT_MASK)
 -#define DMA_64BIT_PFNIOVA_PFN(DMA_64BIT_MASK)
 -
  /* iova structure */
  struct iova {
   struct rb_node  node;
 @@ -44,6 +31,7 @@ struct iova_domain {
   spinlock_t  iova_rbtree_lock; /* Lock to protect update of rbtree */
   struct rb_root  rbroot; /* iova domain rbtree root */
   struct rb_node  *cached32_node; /* Save last alloced node */
 + unsigned long   dma_32bit_pfn;
  };
  
  struct iova *alloc_iova_mem(void);
 @@ -56,7 +44,7 @@ struct iova *alloc_iova(struct iova_domain *iovad, unsigned 
 long size,
  struct iova *reserve_iova(struct iova_domain *iovad, unsigned long pfn_lo,
   unsigned long pfn_hi);
  void copy_reserved_iova(struct iova_domain *from, struct iova_domain *to);
 -void init_iova_domain(struct iova_domain *iovad);
 +void init_iova_domain(struct iova_domain *iovad,

Re: [PATCH] intel-iommu: fix sg_page()

2007-10-22 Thread Keshavamurthy, Anil S

On Tue, Oct 23, 2007 at 01:59:26PM +0900, FUJITA Tomonori wrote:
> drivers/pci/intel-iommu.c: In function 'intel_unmap_sg':
> drivers/pci/intel-iommu.c:1987: error: 'struct scatterlist' has no member 
> named 'page'
> drivers/pci/intel-iommu.c: In function 'intel_nontranslate_map_sg':
> drivers/pci/intel-iommu.c:2013: error: 'struct scatterlist' has no member 
> named 'page'
> drivers/pci/intel-iommu.c:2014: error: 'struct scatterlist' has no member 
> named 'page'
> drivers/pci/intel-iommu.c: In function 'intel_map_sg':
> drivers/pci/intel-iommu.c:2044: error: 'struct scatterlist' has no member 
> named 'page'
> drivers/pci/intel-iommu.c:2068: error: 'struct scatterlist' has no member 
> named 'page'
> 
> Signed-off-by: FUJITA Tomonori <[EMAIL PROTECTED]>
Acked by: Anil S Keshavamurthy <[EMAIL PROTECTED]>

> ---
>  drivers/pci/intel-iommu.c |   11 +--
>  1 files changed, 5 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/pci/intel-iommu.c b/drivers/pci/intel-iommu.c
> index b3d7031..0f6e528 100644
> --- a/drivers/pci/intel-iommu.c
> +++ b/drivers/pci/intel-iommu.c
> @@ -1962,7 +1962,6 @@ static void intel_free_coherent(struct device *hwdev, 
> size_t size,
>   free_pages((unsigned long)vaddr, order);
>  }
>  
> -#define SG_ENT_VIRT_ADDRESS(sg)  (page_address((sg)->page) + 
> (sg)->offset)
>  static void intel_unmap_sg(struct device *hwdev, struct scatterlist *sglist,
>   int nelems, int dir)
>  {
> @@ -1984,7 +1983,7 @@ static void intel_unmap_sg(struct device *hwdev, struct 
> scatterlist *sglist,
>   if (!iova)
>   return;
>   for_each_sg(sglist, sg, nelems, i) {
> - addr = SG_ENT_VIRT_ADDRESS(sg);
> + addr = sg_virt(sg);
>   size += aligned_size((u64)addr, sg->length);
>   }
>  
> @@ -2010,8 +2009,8 @@ static int intel_nontranslate_map_sg(struct device 
> *hddev,
>   struct scatterlist *sg;
>  
>   for_each_sg(sglist, sg, nelems, i) {
> - BUG_ON(!sg->page);
> - sg->dma_address = virt_to_bus(SG_ENT_VIRT_ADDRESS(sg));
> + BUG_ON(!sg_page(sg));
> + sg->dma_address = virt_to_bus(sg_virt(sg));
>   sg->dma_length = sg->length;
>   }
>   return nelems;
> @@ -2041,7 +2040,7 @@ static int intel_map_sg(struct device *hwdev, struct 
> scatterlist *sglist,
>   return 0;
>  
>   for_each_sg(sglist, sg, nelems, i) {
> - addr = SG_ENT_VIRT_ADDRESS(sg);
> + addr = sg_virt(sg);
>   addr = (void *)virt_to_phys(addr);
>   size += aligned_size((u64)addr, sg->length);
>   }
> @@ -2065,7 +2064,7 @@ static int intel_map_sg(struct device *hwdev, struct 
> scatterlist *sglist,
>   start_addr = iova->pfn_lo << PAGE_SHIFT_4K;
>   offset = 0;
>   for_each_sg(sglist, sg, nelems, i) {
> - addr = SG_ENT_VIRT_ADDRESS(sg);
> + addr = sg_virt(sg);
>   addr = (void *)virt_to_phys(addr);
>   size = aligned_size((u64)addr, sg->length);
>   ret = domain_page_mapping(domain, start_addr + offset,
> -- 
> 1.5.2.4
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] intel-iommu: fix sg_page()

2007-10-22 Thread Keshavamurthy, Anil S

On Tue, Oct 23, 2007 at 01:59:26PM +0900, FUJITA Tomonori wrote:
 drivers/pci/intel-iommu.c: In function 'intel_unmap_sg':
 drivers/pci/intel-iommu.c:1987: error: 'struct scatterlist' has no member 
 named 'page'
 drivers/pci/intel-iommu.c: In function 'intel_nontranslate_map_sg':
 drivers/pci/intel-iommu.c:2013: error: 'struct scatterlist' has no member 
 named 'page'
 drivers/pci/intel-iommu.c:2014: error: 'struct scatterlist' has no member 
 named 'page'
 drivers/pci/intel-iommu.c: In function 'intel_map_sg':
 drivers/pci/intel-iommu.c:2044: error: 'struct scatterlist' has no member 
 named 'page'
 drivers/pci/intel-iommu.c:2068: error: 'struct scatterlist' has no member 
 named 'page'
 
 Signed-off-by: FUJITA Tomonori [EMAIL PROTECTED]
Acked by: Anil S Keshavamurthy [EMAIL PROTECTED]

 ---
  drivers/pci/intel-iommu.c |   11 +--
  1 files changed, 5 insertions(+), 6 deletions(-)
 
 diff --git a/drivers/pci/intel-iommu.c b/drivers/pci/intel-iommu.c
 index b3d7031..0f6e528 100644
 --- a/drivers/pci/intel-iommu.c
 +++ b/drivers/pci/intel-iommu.c
 @@ -1962,7 +1962,6 @@ static void intel_free_coherent(struct device *hwdev, 
 size_t size,
   free_pages((unsigned long)vaddr, order);
  }
  
 -#define SG_ENT_VIRT_ADDRESS(sg)  (page_address((sg)-page) + 
 (sg)-offset)
  static void intel_unmap_sg(struct device *hwdev, struct scatterlist *sglist,
   int nelems, int dir)
  {
 @@ -1984,7 +1983,7 @@ static void intel_unmap_sg(struct device *hwdev, struct 
 scatterlist *sglist,
   if (!iova)
   return;
   for_each_sg(sglist, sg, nelems, i) {
 - addr = SG_ENT_VIRT_ADDRESS(sg);
 + addr = sg_virt(sg);
   size += aligned_size((u64)addr, sg-length);
   }
  
 @@ -2010,8 +2009,8 @@ static int intel_nontranslate_map_sg(struct device 
 *hddev,
   struct scatterlist *sg;
  
   for_each_sg(sglist, sg, nelems, i) {
 - BUG_ON(!sg-page);
 - sg-dma_address = virt_to_bus(SG_ENT_VIRT_ADDRESS(sg));
 + BUG_ON(!sg_page(sg));
 + sg-dma_address = virt_to_bus(sg_virt(sg));
   sg-dma_length = sg-length;
   }
   return nelems;
 @@ -2041,7 +2040,7 @@ static int intel_map_sg(struct device *hwdev, struct 
 scatterlist *sglist,
   return 0;
  
   for_each_sg(sglist, sg, nelems, i) {
 - addr = SG_ENT_VIRT_ADDRESS(sg);
 + addr = sg_virt(sg);
   addr = (void *)virt_to_phys(addr);
   size += aligned_size((u64)addr, sg-length);
   }
 @@ -2065,7 +2064,7 @@ static int intel_map_sg(struct device *hwdev, struct 
 scatterlist *sglist,
   start_addr = iova-pfn_lo  PAGE_SHIFT_4K;
   offset = 0;
   for_each_sg(sglist, sg, nelems, i) {
 - addr = SG_ENT_VIRT_ADDRESS(sg);
 + addr = sg_virt(sg);
   addr = (void *)virt_to_phys(addr);
   size = aligned_size((u64)addr, sg-length);
   ret = domain_page_mapping(domain, start_addr + offset,
 -- 
 1.5.2.4
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch take 2][Intel-IOMMU] Fix for IOMMU early crash

2007-10-04 Thread Keshavamurthy, Anil S

On Thu, Oct 04, 2007 at 01:39:39PM +1000, Benjamin Herrenschmidt wrote:
> 
> > > Why don't you use the new struct dev_archdata mechanism ? That's what I
> > > use on powerpc to provide optional iommu linkage to any device in the
> > > system.
> > Good one. I will certainly try out your idea and will update the list
> > tomorrow.
> 
> The advantage is that it allows to completely isolate the iommu code
> from any dependency to PCI, which means you can implement DMA ops
> support for various platform devices or other fancy things. Maybe not
> the most useful in x86-land, but still ;-)

Andrew,
Please delete my previous patch and add the below patch 
to your MM queue. 

I guess the patch name is "intel-iommu-fix-for-iommu-early-crash.patch".

Thanks,
Anil
--

Subject: [Intel-IOMMU] Fix for IOMMU early crash

pci_dev's->sysdata is highly overloaded and currently
IOMMU is broken due to IOMMU code depending on this field.

This patch introduces new field in pci_dev's dev.archdata struct to
hold IOMMU specific per device IOMMU private data.

Signed-off-by: Anil S Keshavamurthy <[EMAIL PROTECTED]>

---
 drivers/pci/intel-iommu.c   |   22 +++---
 include/asm-x86_64/device.h |3 +++
 2 files changed, 14 insertions(+), 11 deletions(-)

Index: 2.6-mm/drivers/pci/intel-iommu.c
===
--- 2.6-mm.orig/drivers/pci/intel-iommu.c   2007-10-04 11:35:09.0 
-0700
+++ 2.6-mm/drivers/pci/intel-iommu.c2007-10-04 11:47:47.0 -0700
@@ -1348,7 +1348,7 @@
list_del(>link);
list_del(>global);
if (info->dev)
-   info->dev->sysdata = NULL;
+   info->dev->dev.archdata.iommu = NULL;
spin_unlock_irqrestore(_domain_lock, flags);
 
detach_domain_for_dev(info->domain, info->bus, info->devfn);
@@ -1361,7 +1361,7 @@
 
 /*
  * find_domain
- * Note: we use struct pci_dev->sysdata stores the info
+ * Note: we use struct pci_dev->dev.archdata.iommu stores the info
  */
 struct dmar_domain *
 find_domain(struct pci_dev *pdev)
@@ -1369,7 +1369,7 @@
struct device_domain_info *info;
 
/* No lock here, assumes no domain exit in normal case */
-   info = pdev->sysdata;
+   info = pdev->dev.archdata.iommu;
if (info)
return info->domain;
return NULL;
@@ -1519,7 +1519,7 @@
}
list_add(>link, >devices);
list_add(>global, _domain_list);
-   pdev->sysdata = info;
+   pdev->dev.archdata.iommu = info;
spin_unlock_irqrestore(_domain_lock, flags);
return domain;
 error:
@@ -1579,7 +1579,7 @@
 static inline int iommu_prepare_rmrr_dev(struct dmar_rmrr_unit *rmrr,
struct pci_dev *pdev)
 {
-   if (pdev->sysdata == DUMMY_DEVICE_DOMAIN_INFO)
+   if (pdev->dev.archdata.iommu == DUMMY_DEVICE_DOMAIN_INFO)
return 0;
return iommu_prepare_identity_map(pdev, rmrr->base_address,
rmrr->end_address + 1);
@@ -1595,7 +1595,7 @@
int ret;
 
for_each_pci_dev(pdev) {
-   if (pdev->sysdata == DUMMY_DEVICE_DOMAIN_INFO ||
+   if (pdev->dev.archdata.iommu == DUMMY_DEVICE_DOMAIN_INFO ||
!IS_GFX_DEVICE(pdev))
continue;
printk(KERN_INFO "IOMMU: gfx device %s 1-1 mapping\n",
@@ -1836,7 +1836,7 @@
int prot = 0;
 
BUG_ON(dir == DMA_NONE);
-   if (pdev->sysdata == DUMMY_DEVICE_DOMAIN_INFO)
+   if (pdev->dev.archdata.iommu == DUMMY_DEVICE_DOMAIN_INFO)
return virt_to_bus(addr);
 
domain = get_valid_domain_for_dev(pdev);
@@ -1900,7 +1900,7 @@
unsigned long start_addr;
struct iova *iova;
 
-   if (pdev->sysdata == DUMMY_DEVICE_DOMAIN_INFO)
+   if (pdev->dev.archdata.iommu == DUMMY_DEVICE_DOMAIN_INFO)
return;
domain = find_domain(pdev);
BUG_ON(!domain);
@@ -1974,7 +1974,7 @@
size_t size = 0;
void *addr;
 
-   if (pdev->sysdata == DUMMY_DEVICE_DOMAIN_INFO)
+   if (pdev->dev.archdata.iommu == DUMMY_DEVICE_DOMAIN_INFO)
return;
 
domain = find_domain(pdev);
@@ -2032,7 +2032,7 @@
unsigned long start_addr;
 
BUG_ON(dir == DMA_NONE);
-   if (pdev->sysdata == DUMMY_DEVICE_DOMAIN_INFO)
+   if (pdev->dev.archdata.iommu == DUMMY_DEVICE_DOMAIN_INFO)
return intel_nontranslate_map_sg(hwdev, sg, nelems, dir);
 
domain = get_valid_domain_for_dev(pdev);
@@ -2234,7 +2234,7 @@
for (i = 0; i < drhd->devices_cnt; i++) {
if (!drhd->devices[i])
continue;
-   drhd->devices[i]->sysdata = DUMMY_DEVICE_DOMAIN_INFO;
+   drhd->devices[i]->dev.archdata.iommu =

Re: [patch take 2][Intel-IOMMU] Fix for IOMMU early crash

2007-10-04 Thread Keshavamurthy, Anil S

On Thu, Oct 04, 2007 at 01:39:39PM +1000, Benjamin Herrenschmidt wrote:
 
   Why don't you use the new struct dev_archdata mechanism ? That's what I
   use on powerpc to provide optional iommu linkage to any device in the
   system.
  Good one. I will certainly try out your idea and will update the list
  tomorrow.
 
 The advantage is that it allows to completely isolate the iommu code
 from any dependency to PCI, which means you can implement DMA ops
 support for various platform devices or other fancy things. Maybe not
 the most useful in x86-land, but still ;-)

Andrew,
Please delete my previous patch and add the below patch 
to your MM queue. 

I guess the patch name is intel-iommu-fix-for-iommu-early-crash.patch.

Thanks,
Anil
--

Subject: [Intel-IOMMU] Fix for IOMMU early crash

pci_dev's-sysdata is highly overloaded and currently
IOMMU is broken due to IOMMU code depending on this field.

This patch introduces new field in pci_dev's dev.archdata struct to
hold IOMMU specific per device IOMMU private data.

Signed-off-by: Anil S Keshavamurthy [EMAIL PROTECTED]

---
 drivers/pci/intel-iommu.c   |   22 +++---
 include/asm-x86_64/device.h |3 +++
 2 files changed, 14 insertions(+), 11 deletions(-)

Index: 2.6-mm/drivers/pci/intel-iommu.c
===
--- 2.6-mm.orig/drivers/pci/intel-iommu.c   2007-10-04 11:35:09.0 
-0700
+++ 2.6-mm/drivers/pci/intel-iommu.c2007-10-04 11:47:47.0 -0700
@@ -1348,7 +1348,7 @@
list_del(info-link);
list_del(info-global);
if (info-dev)
-   info-dev-sysdata = NULL;
+   info-dev-dev.archdata.iommu = NULL;
spin_unlock_irqrestore(device_domain_lock, flags);
 
detach_domain_for_dev(info-domain, info-bus, info-devfn);
@@ -1361,7 +1361,7 @@
 
 /*
  * find_domain
- * Note: we use struct pci_dev-sysdata stores the info
+ * Note: we use struct pci_dev-dev.archdata.iommu stores the info
  */
 struct dmar_domain *
 find_domain(struct pci_dev *pdev)
@@ -1369,7 +1369,7 @@
struct device_domain_info *info;
 
/* No lock here, assumes no domain exit in normal case */
-   info = pdev-sysdata;
+   info = pdev-dev.archdata.iommu;
if (info)
return info-domain;
return NULL;
@@ -1519,7 +1519,7 @@
}
list_add(info-link, domain-devices);
list_add(info-global, device_domain_list);
-   pdev-sysdata = info;
+   pdev-dev.archdata.iommu = info;
spin_unlock_irqrestore(device_domain_lock, flags);
return domain;
 error:
@@ -1579,7 +1579,7 @@
 static inline int iommu_prepare_rmrr_dev(struct dmar_rmrr_unit *rmrr,
struct pci_dev *pdev)
 {
-   if (pdev-sysdata == DUMMY_DEVICE_DOMAIN_INFO)
+   if (pdev-dev.archdata.iommu == DUMMY_DEVICE_DOMAIN_INFO)
return 0;
return iommu_prepare_identity_map(pdev, rmrr-base_address,
rmrr-end_address + 1);
@@ -1595,7 +1595,7 @@
int ret;
 
for_each_pci_dev(pdev) {
-   if (pdev-sysdata == DUMMY_DEVICE_DOMAIN_INFO ||
+   if (pdev-dev.archdata.iommu == DUMMY_DEVICE_DOMAIN_INFO ||
!IS_GFX_DEVICE(pdev))
continue;
printk(KERN_INFO IOMMU: gfx device %s 1-1 mapping\n,
@@ -1836,7 +1836,7 @@
int prot = 0;
 
BUG_ON(dir == DMA_NONE);
-   if (pdev-sysdata == DUMMY_DEVICE_DOMAIN_INFO)
+   if (pdev-dev.archdata.iommu == DUMMY_DEVICE_DOMAIN_INFO)
return virt_to_bus(addr);
 
domain = get_valid_domain_for_dev(pdev);
@@ -1900,7 +1900,7 @@
unsigned long start_addr;
struct iova *iova;
 
-   if (pdev-sysdata == DUMMY_DEVICE_DOMAIN_INFO)
+   if (pdev-dev.archdata.iommu == DUMMY_DEVICE_DOMAIN_INFO)
return;
domain = find_domain(pdev);
BUG_ON(!domain);
@@ -1974,7 +1974,7 @@
size_t size = 0;
void *addr;
 
-   if (pdev-sysdata == DUMMY_DEVICE_DOMAIN_INFO)
+   if (pdev-dev.archdata.iommu == DUMMY_DEVICE_DOMAIN_INFO)
return;
 
domain = find_domain(pdev);
@@ -2032,7 +2032,7 @@
unsigned long start_addr;
 
BUG_ON(dir == DMA_NONE);
-   if (pdev-sysdata == DUMMY_DEVICE_DOMAIN_INFO)
+   if (pdev-dev.archdata.iommu == DUMMY_DEVICE_DOMAIN_INFO)
return intel_nontranslate_map_sg(hwdev, sg, nelems, dir);
 
domain = get_valid_domain_for_dev(pdev);
@@ -2234,7 +2234,7 @@
for (i = 0; i  drhd-devices_cnt; i++) {
if (!drhd-devices[i])
continue;
-   drhd-devices[i]-sysdata = DUMMY_DEVICE_DOMAIN_INFO;
+   drhd-devices[i]-dev.archdata.iommu = 
DUMMY_DEVICE_DOMAIN_INFO;

Re: [patch take 2][Intel-IOMMU] Fix for IOMMU early crash

2007-10-03 Thread Keshavamurthy, Anil S

On Thu, Oct 04, 2007 at 11:19:33AM +1000, Benjamin Herrenschmidt wrote:
> > Index: 2.6-mm/include/linux/pci.h
> > ===
> > --- 2.6-mm.orig/include/linux/pci.h 2007-10-03 13:48:20.0 -0700
> > +++ 2.6-mm/include/linux/pci.h  2007-10-03 13:49:08.0 -0700
> > @@ -195,6 +195,7 @@
> >  #ifdef CONFIG_PCI_MSI
> > struct list_head msi_list;
> >  #endif
> > +   void*iommu_private; /* hook for IOMMU specific extension */
> >  };
> 
> I'm not fan of this. That would imply that iommu stuff is specific to
> PCI or that sort of thing.
> 
> Why don't you use the new struct dev_archdata mechanism ? That's what I
> use on powerpc to provide optional iommu linkage to any device in the
> system.
Good one. I will certainly try out your idea and will update the list
tomorrow.

-Anil
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH -mm] intel-iommu sg chaining support

2007-10-03 Thread Keshavamurthy, Anil S

On Wed, Oct 03, 2007 at 02:12:03PM -0700, Andrew Morton wrote:
> On Mon, 1 Oct 2007 09:12:56 -0700
> "Keshavamurthy, Anil S" <[EMAIL PROTECTED]> wrote:
> 
> > On Sat, Sep 29, 2007 at 05:16:38AM -0700, FUJITA Tomonori wrote:
> > > 
> > >x86_64 defines ARCH_HAS_SG_CHAIN. So if IOMMU implementations don't
> > >support sg chaining, we will get data corruption.
> > >Signed-off-by: FUJITA Tomonori <[EMAIL PROTECTED]>
> > 
> > Acked-by: Anil S Keshavamurthy <[EMAIL PROTECTED]>
> > 
> 
> Am I correct in believing that this patch is needed only when the
> chaining patches which are presently in git-block are combined with the
> intel-iommu work which is presently in -mm?
Yes, that is correct. 

-Anil
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[patch take 2][Intel-IOMMU] Fix for IOMMU early crash

2007-10-03 Thread Keshavamurthy, Anil S

Subject: [patch][Intel-IOMMU] Fix for IOMMU early crash

pci_dev's->sysdata is highly overloaded and currently
IOMMU is broken due to IOMMU code depending on this field.

This patch introduces new field in pci_dev's struct to
hold IOMMU specific per device IOMMU private data.

Signed-off-by: Anil S Keshavamurthy <[EMAIL PROTECTED]>

---
 drivers/pci/intel-iommu.c |   22 +++---
 include/linux/pci.h   |1 +
 2 files changed, 12 insertions(+), 11 deletions(-)

Index: 2.6-mm/drivers/pci/intel-iommu.c
===
--- 2.6-mm.orig/drivers/pci/intel-iommu.c   2007-10-03 13:48:18.0 
-0700
+++ 2.6-mm/drivers/pci/intel-iommu.c2007-10-03 13:48:41.0 -0700
@@ -1348,7 +1348,7 @@
list_del(>link);
list_del(>global);
if (info->dev)
-   info->dev->sysdata = NULL;
+   info->dev->iommu_private = NULL;
spin_unlock_irqrestore(_domain_lock, flags);
 
detach_domain_for_dev(info->domain, info->bus, info->devfn);
@@ -1361,7 +1361,7 @@
 
 /*
  * find_domain
- * Note: we use struct pci_dev->sysdata stores the info
+ * Note: we use struct pci_dev->iommu_private stores the info
  */
 struct dmar_domain *
 find_domain(struct pci_dev *pdev)
@@ -1369,7 +1369,7 @@
struct device_domain_info *info;
 
/* No lock here, assumes no domain exit in normal case */
-   info = pdev->sysdata;
+   info = pdev->iommu_private;
if (info)
return info->domain;
return NULL;
@@ -1519,7 +1519,7 @@
}
list_add(>link, >devices);
list_add(>global, _domain_list);
-   pdev->sysdata = info;
+   pdev->iommu_private = info;
spin_unlock_irqrestore(_domain_lock, flags);
return domain;
 error:
@@ -1579,7 +1579,7 @@
 static inline int iommu_prepare_rmrr_dev(struct dmar_rmrr_unit *rmrr,
struct pci_dev *pdev)
 {
-   if (pdev->sysdata == DUMMY_DEVICE_DOMAIN_INFO)
+   if (pdev->iommu_private == DUMMY_DEVICE_DOMAIN_INFO)
return 0;
return iommu_prepare_identity_map(pdev, rmrr->base_address,
rmrr->end_address + 1);
@@ -1595,7 +1595,7 @@
int ret;
 
for_each_pci_dev(pdev) {
-   if (pdev->sysdata == DUMMY_DEVICE_DOMAIN_INFO ||
+   if (pdev->iommu_private == DUMMY_DEVICE_DOMAIN_INFO ||
!IS_GFX_DEVICE(pdev))
continue;
printk(KERN_INFO "IOMMU: gfx device %s 1-1 mapping\n",
@@ -1836,7 +1836,7 @@
int prot = 0;
 
BUG_ON(dir == DMA_NONE);
-   if (pdev->sysdata == DUMMY_DEVICE_DOMAIN_INFO)
+   if (pdev->iommu_private == DUMMY_DEVICE_DOMAIN_INFO)
return virt_to_bus(addr);
 
domain = get_valid_domain_for_dev(pdev);
@@ -1900,7 +1900,7 @@
unsigned long start_addr;
struct iova *iova;
 
-   if (pdev->sysdata == DUMMY_DEVICE_DOMAIN_INFO)
+   if (pdev->iommu_private == DUMMY_DEVICE_DOMAIN_INFO)
return;
domain = find_domain(pdev);
BUG_ON(!domain);
@@ -1974,7 +1974,7 @@
size_t size = 0;
void *addr;
 
-   if (pdev->sysdata == DUMMY_DEVICE_DOMAIN_INFO)
+   if (pdev->iommu_private == DUMMY_DEVICE_DOMAIN_INFO)
return;
 
domain = find_domain(pdev);
@@ -2032,7 +2032,7 @@
unsigned long start_addr;
 
BUG_ON(dir == DMA_NONE);
-   if (pdev->sysdata == DUMMY_DEVICE_DOMAIN_INFO)
+   if (pdev->iommu_private == DUMMY_DEVICE_DOMAIN_INFO)
return intel_nontranslate_map_sg(hwdev, sg, nelems, dir);
 
domain = get_valid_domain_for_dev(pdev);
@@ -2234,7 +2234,7 @@
for (i = 0; i < drhd->devices_cnt; i++) {
if (!drhd->devices[i])
continue;
-   drhd->devices[i]->sysdata = DUMMY_DEVICE_DOMAIN_INFO;
+   drhd->devices[i]->iommu_private = 
DUMMY_DEVICE_DOMAIN_INFO;
}
}
 }
Index: 2.6-mm/include/linux/pci.h
===
--- 2.6-mm.orig/include/linux/pci.h 2007-10-03 13:48:20.0 -0700
+++ 2.6-mm/include/linux/pci.h  2007-10-03 13:49:08.0 -0700
@@ -195,6 +195,7 @@
 #ifdef CONFIG_PCI_MSI
struct list_head msi_list;
 #endif
+   void*iommu_private; /* hook for IOMMU specific extension */
 };
 
 extern struct pci_dev *alloc_pci_dev(void);
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[patch take 2][Intel-IOMMU] Fix for IOMMU early crash

2007-10-03 Thread Keshavamurthy, Anil S

Subject: [patch][Intel-IOMMU] Fix for IOMMU early crash

pci_dev's-sysdata is highly overloaded and currently
IOMMU is broken due to IOMMU code depending on this field.

This patch introduces new field in pci_dev's struct to
hold IOMMU specific per device IOMMU private data.

Signed-off-by: Anil S Keshavamurthy [EMAIL PROTECTED]

---
 drivers/pci/intel-iommu.c |   22 +++---
 include/linux/pci.h   |1 +
 2 files changed, 12 insertions(+), 11 deletions(-)

Index: 2.6-mm/drivers/pci/intel-iommu.c
===
--- 2.6-mm.orig/drivers/pci/intel-iommu.c   2007-10-03 13:48:18.0 
-0700
+++ 2.6-mm/drivers/pci/intel-iommu.c2007-10-03 13:48:41.0 -0700
@@ -1348,7 +1348,7 @@
list_del(info-link);
list_del(info-global);
if (info-dev)
-   info-dev-sysdata = NULL;
+   info-dev-iommu_private = NULL;
spin_unlock_irqrestore(device_domain_lock, flags);
 
detach_domain_for_dev(info-domain, info-bus, info-devfn);
@@ -1361,7 +1361,7 @@
 
 /*
  * find_domain
- * Note: we use struct pci_dev-sysdata stores the info
+ * Note: we use struct pci_dev-iommu_private stores the info
  */
 struct dmar_domain *
 find_domain(struct pci_dev *pdev)
@@ -1369,7 +1369,7 @@
struct device_domain_info *info;
 
/* No lock here, assumes no domain exit in normal case */
-   info = pdev-sysdata;
+   info = pdev-iommu_private;
if (info)
return info-domain;
return NULL;
@@ -1519,7 +1519,7 @@
}
list_add(info-link, domain-devices);
list_add(info-global, device_domain_list);
-   pdev-sysdata = info;
+   pdev-iommu_private = info;
spin_unlock_irqrestore(device_domain_lock, flags);
return domain;
 error:
@@ -1579,7 +1579,7 @@
 static inline int iommu_prepare_rmrr_dev(struct dmar_rmrr_unit *rmrr,
struct pci_dev *pdev)
 {
-   if (pdev-sysdata == DUMMY_DEVICE_DOMAIN_INFO)
+   if (pdev-iommu_private == DUMMY_DEVICE_DOMAIN_INFO)
return 0;
return iommu_prepare_identity_map(pdev, rmrr-base_address,
rmrr-end_address + 1);
@@ -1595,7 +1595,7 @@
int ret;
 
for_each_pci_dev(pdev) {
-   if (pdev-sysdata == DUMMY_DEVICE_DOMAIN_INFO ||
+   if (pdev-iommu_private == DUMMY_DEVICE_DOMAIN_INFO ||
!IS_GFX_DEVICE(pdev))
continue;
printk(KERN_INFO IOMMU: gfx device %s 1-1 mapping\n,
@@ -1836,7 +1836,7 @@
int prot = 0;
 
BUG_ON(dir == DMA_NONE);
-   if (pdev-sysdata == DUMMY_DEVICE_DOMAIN_INFO)
+   if (pdev-iommu_private == DUMMY_DEVICE_DOMAIN_INFO)
return virt_to_bus(addr);
 
domain = get_valid_domain_for_dev(pdev);
@@ -1900,7 +1900,7 @@
unsigned long start_addr;
struct iova *iova;
 
-   if (pdev-sysdata == DUMMY_DEVICE_DOMAIN_INFO)
+   if (pdev-iommu_private == DUMMY_DEVICE_DOMAIN_INFO)
return;
domain = find_domain(pdev);
BUG_ON(!domain);
@@ -1974,7 +1974,7 @@
size_t size = 0;
void *addr;
 
-   if (pdev-sysdata == DUMMY_DEVICE_DOMAIN_INFO)
+   if (pdev-iommu_private == DUMMY_DEVICE_DOMAIN_INFO)
return;
 
domain = find_domain(pdev);
@@ -2032,7 +2032,7 @@
unsigned long start_addr;
 
BUG_ON(dir == DMA_NONE);
-   if (pdev-sysdata == DUMMY_DEVICE_DOMAIN_INFO)
+   if (pdev-iommu_private == DUMMY_DEVICE_DOMAIN_INFO)
return intel_nontranslate_map_sg(hwdev, sg, nelems, dir);
 
domain = get_valid_domain_for_dev(pdev);
@@ -2234,7 +2234,7 @@
for (i = 0; i  drhd-devices_cnt; i++) {
if (!drhd-devices[i])
continue;
-   drhd-devices[i]-sysdata = DUMMY_DEVICE_DOMAIN_INFO;
+   drhd-devices[i]-iommu_private = 
DUMMY_DEVICE_DOMAIN_INFO;
}
}
 }
Index: 2.6-mm/include/linux/pci.h
===
--- 2.6-mm.orig/include/linux/pci.h 2007-10-03 13:48:20.0 -0700
+++ 2.6-mm/include/linux/pci.h  2007-10-03 13:49:08.0 -0700
@@ -195,6 +195,7 @@
 #ifdef CONFIG_PCI_MSI
struct list_head msi_list;
 #endif
+   void*iommu_private; /* hook for IOMMU specific extension */
 };
 
 extern struct pci_dev *alloc_pci_dev(void);
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH -mm] intel-iommu sg chaining support

2007-10-03 Thread Keshavamurthy, Anil S

On Wed, Oct 03, 2007 at 02:12:03PM -0700, Andrew Morton wrote:
 On Mon, 1 Oct 2007 09:12:56 -0700
 Keshavamurthy, Anil S [EMAIL PROTECTED] wrote:
 
  On Sat, Sep 29, 2007 at 05:16:38AM -0700, FUJITA Tomonori wrote:
   
  x86_64 defines ARCH_HAS_SG_CHAIN. So if IOMMU implementations don't
  support sg chaining, we will get data corruption.
  Signed-off-by: FUJITA Tomonori [EMAIL PROTECTED]
  
  Acked-by: Anil S Keshavamurthy [EMAIL PROTECTED]
  
 
 Am I correct in believing that this patch is needed only when the
 chaining patches which are presently in git-block are combined with the
 intel-iommu work which is presently in -mm?
Yes, that is correct. 

-Anil
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch take 2][Intel-IOMMU] Fix for IOMMU early crash

2007-10-03 Thread Keshavamurthy, Anil S

On Thu, Oct 04, 2007 at 11:19:33AM +1000, Benjamin Herrenschmidt wrote:
  Index: 2.6-mm/include/linux/pci.h
  ===
  --- 2.6-mm.orig/include/linux/pci.h 2007-10-03 13:48:20.0 -0700
  +++ 2.6-mm/include/linux/pci.h  2007-10-03 13:49:08.0 -0700
  @@ -195,6 +195,7 @@
   #ifdef CONFIG_PCI_MSI
  struct list_head msi_list;
   #endif
  +   void*iommu_private; /* hook for IOMMU specific extension */
   };
 
 I'm not fan of this. That would imply that iommu stuff is specific to
 PCI or that sort of thing.
 
 Why don't you use the new struct dev_archdata mechanism ? That's what I
 use on powerpc to provide optional iommu linkage to any device in the
 system.
Good one. I will certainly try out your idea and will update the list
tomorrow.

-Anil
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH -mm] intel-iommu sg chaining support

2007-10-01 Thread Keshavamurthy, Anil S

On Sat, Sep 29, 2007 at 05:16:38AM -0700, FUJITA Tomonori wrote:
> 
>x86_64 defines ARCH_HAS_SG_CHAIN. So if IOMMU implementations don't
>support sg chaining, we will get data corruption.
>Signed-off-by: FUJITA Tomonori <[EMAIL PROTECTED]>

Acked-by: Anil S Keshavamurthy <[EMAIL PROTECTED]>

>---
> drivers/pci/intel-iommu.c |   32 
> 1 files changed, 16 insertions(+), 16 deletions(-)
>diff --git a/drivers/pci/intel-iommu.c b/drivers/pci/intel-iommu.c
>index dab329f..4668995 100644
>--- a/drivers/pci/intel-iommu.c
>+++ b/drivers/pci/intel-iommu.c
>@@ -1963,7 +1963,7 @@ static void intel_free_coherent(struct device *hwdev,
>size_t size,
> }
> #define  SG_ENT_VIRT_ADDRESS(sg) (page_address((sg)->page)  +
>(sg)->offset)
>-static void intel_unmap_sg(struct device *hwdev, struct scatterlist *sg,
>+static  void  intel_unmap_sg(struct device *hwdev, struct scatterlist
>*sglist,
>int nelems, int dir)
> {
>int i;
>@@ -1973,16 +1973,17 @@ static void intel_unmap_sg(struct device *hwdev,
>struct scatterlist *sg,
>struct iova *iova;
>size_t size = 0;
>void *addr;
>+   struct scatterlist *sg;
>if (pdev->sysdata == DUMMY_DEVICE_DOMAIN_INFO)
>return;
>domain = find_domain(pdev);
>-   iova = find_iova(>iovad, IOVA_PFN(sg[0].dma_address));
>+   iova = find_iova(>iovad, IOVA_PFN(sglist[0].dma_address));
>if (!iova)
>return;
>-   for (i = 0; i < nelems; i++, sg++) {
>+   for_each_sg(sglist, sg, nelems, i) {
>addr = SG_ENT_VIRT_ADDRESS(sg);
>size += aligned_size((u64)addr, sg->length);
>}
>@@ -2003,20 +2004,20 @@ static void intel_unmap_sg(struct device *hwdev,
>struct scatterlist *sg,
> }
> static int intel_nontranslate_map_sg(struct device *hddev,
>-   struct scatterlist *sg, int nelems, int dir)
>+   struct scatterlist *sglist, int nelems, int dir)
> {
>int i;
>+   struct scatterlist *sg;
>-   for (i = 0; i < nelems; i++) {
>-   struct scatterlist *s = [i];
>-   BUG_ON(!s->page);
>-   s->dma_address = virt_to_bus(SG_ENT_VIRT_ADDRESS(s));
>-   s->dma_length = s->length;
>+   for_each_sg(sglist, sg, nelems, i) {
>+   BUG_ON(!sg->page);
>+   sg->dma_address = virt_to_bus(SG_ENT_VIRT_ADDRESS(sg));
>+   sg->dma_length = sg->length;
>}
>return nelems;
> }
>-static int intel_map_sg(struct device *hwdev, struct scatterlist *sg,
>+static int intel_map_sg(struct device *hwdev, struct scatterlist *sglist,
>int nelems, int dir)
> {
>void *addr;
>@@ -2028,18 +2029,18 @@ static int intel_map_sg(struct device *hwdev, 
> struct
>scatterlist *sg,
>size_t offset = 0;
>struct iova *iova = NULL;
>int ret;
>-   struct scatterlist *orig_sg = sg;
>+   struct scatterlist *sg;
>unsigned long start_addr;
>BUG_ON(dir == DMA_NONE);
>if (pdev->sysdata == DUMMY_DEVICE_DOMAIN_INFO)
>-   return intel_nontranslate_map_sg(hwdev, sg, nelems, dir);
>+   return intel_nontranslate_map_sg(hwdev, sglist, nelems,
>dir);
>domain = get_valid_domain_for_dev(pdev);
>if (!domain)
>return 0;
>-   for (i = 0; i < nelems; i++, sg++) {
>+   for_each_sg(sglist, sg, nelems, i) {
>addr = SG_ENT_VIRT_ADDRESS(sg);
>addr = (void *)virt_to_phys(addr);
>size += aligned_size((u64)addr, sg->length);
>@@ -2047,7 +2048,7 @@ static int intel_map_sg(struct device *hwdev, struct
>scatterlist *sg,
>iova = __intel_alloc_iova(hwdev, domain, size);
>if (!iova) {
>-   orig_sg->dma_length = 0;
>+   sglist->dma_length = 0;
>return 0;
>}
>@@ -2063,8 +2064,7 @@ static int intel_map_sg(struct device *hwdev, struct
>scatterlist *sg,
>start_addr = iova->pfn_lo << PAGE_SHIFT_4K;
>offset = 0;
>-   sg = orig_sg;
>-   for (i = 0; i < nelems; i++, sg++) {
>+   for_each_sg(sglist, sg, nelems, i) {
>addr = SG_ENT_VIRT_ADDRESS(sg);
>addr = (void *)virt_to_phys(addr);
>size = aligned_size((u64)addr, sg->length);
>--
>1.5.2.4
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH -mm] intel-iommu sg chaining support

2007-10-01 Thread Keshavamurthy, Anil S

On Sat, Sep 29, 2007 at 05:16:38AM -0700, FUJITA Tomonori wrote:
 
x86_64 defines ARCH_HAS_SG_CHAIN. So if IOMMU implementations don't
support sg chaining, we will get data corruption.
Signed-off-by: FUJITA Tomonori [EMAIL PROTECTED]

Acked-by: Anil S Keshavamurthy [EMAIL PROTECTED]

---
 drivers/pci/intel-iommu.c |   32 
 1 files changed, 16 insertions(+), 16 deletions(-)
diff --git a/drivers/pci/intel-iommu.c b/drivers/pci/intel-iommu.c
index dab329f..4668995 100644
--- a/drivers/pci/intel-iommu.c
+++ b/drivers/pci/intel-iommu.c
@@ -1963,7 +1963,7 @@ static void intel_free_coherent(struct device *hwdev,
size_t size,
 }
 #define  SG_ENT_VIRT_ADDRESS(sg) (page_address((sg)-page)  +
(sg)-offset)
-static void intel_unmap_sg(struct device *hwdev, struct scatterlist *sg,
+static  void  intel_unmap_sg(struct device *hwdev, struct scatterlist
*sglist,
int nelems, int dir)
 {
int i;
@@ -1973,16 +1973,17 @@ static void intel_unmap_sg(struct device *hwdev,
struct scatterlist *sg,
struct iova *iova;
size_t size = 0;
void *addr;
+   struct scatterlist *sg;
if (pdev-sysdata == DUMMY_DEVICE_DOMAIN_INFO)
return;
domain = find_domain(pdev);
-   iova = find_iova(domain-iovad, IOVA_PFN(sg[0].dma_address));
+   iova = find_iova(domain-iovad, IOVA_PFN(sglist[0].dma_address));
if (!iova)
return;
-   for (i = 0; i  nelems; i++, sg++) {
+   for_each_sg(sglist, sg, nelems, i) {
addr = SG_ENT_VIRT_ADDRESS(sg);
size += aligned_size((u64)addr, sg-length);
}
@@ -2003,20 +2004,20 @@ static void intel_unmap_sg(struct device *hwdev,
struct scatterlist *sg,
 }
 static int intel_nontranslate_map_sg(struct device *hddev,
-   struct scatterlist *sg, int nelems, int dir)
+   struct scatterlist *sglist, int nelems, int dir)
 {
int i;
+   struct scatterlist *sg;
-   for (i = 0; i  nelems; i++) {
-   struct scatterlist *s = sg[i];
-   BUG_ON(!s-page);
-   s-dma_address = virt_to_bus(SG_ENT_VIRT_ADDRESS(s));
-   s-dma_length = s-length;
+   for_each_sg(sglist, sg, nelems, i) {
+   BUG_ON(!sg-page);
+   sg-dma_address = virt_to_bus(SG_ENT_VIRT_ADDRESS(sg));
+   sg-dma_length = sg-length;
}
return nelems;
 }
-static int intel_map_sg(struct device *hwdev, struct scatterlist *sg,
+static int intel_map_sg(struct device *hwdev, struct scatterlist *sglist,
int nelems, int dir)
 {
void *addr;
@@ -2028,18 +2029,18 @@ static int intel_map_sg(struct device *hwdev, 
 struct
scatterlist *sg,
size_t offset = 0;
struct iova *iova = NULL;
int ret;
-   struct scatterlist *orig_sg = sg;
+   struct scatterlist *sg;
unsigned long start_addr;
BUG_ON(dir == DMA_NONE);
if (pdev-sysdata == DUMMY_DEVICE_DOMAIN_INFO)
-   return intel_nontranslate_map_sg(hwdev, sg, nelems, dir);
+   return intel_nontranslate_map_sg(hwdev, sglist, nelems,
dir);
domain = get_valid_domain_for_dev(pdev);
if (!domain)
return 0;
-   for (i = 0; i  nelems; i++, sg++) {
+   for_each_sg(sglist, sg, nelems, i) {
addr = SG_ENT_VIRT_ADDRESS(sg);
addr = (void *)virt_to_phys(addr);
size += aligned_size((u64)addr, sg-length);
@@ -2047,7 +2048,7 @@ static int intel_map_sg(struct device *hwdev, struct
scatterlist *sg,
iova = __intel_alloc_iova(hwdev, domain, size);
if (!iova) {
-   orig_sg-dma_length = 0;
+   sglist-dma_length = 0;
return 0;
}
@@ -2063,8 +2064,7 @@ static int intel_map_sg(struct device *hwdev, struct
scatterlist *sg,
start_addr = iova-pfn_lo  PAGE_SHIFT_4K;
offset = 0;
-   sg = orig_sg;
-   for (i = 0; i  nelems; i++, sg++) {
+   for_each_sg(sglist, sg, nelems, i) {
addr = SG_ENT_VIRT_ADDRESS(sg);
addr = (void *)virt_to_phys(addr);
size = aligned_size((u64)addr, sg-length);
--
1.5.2.4
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: KPROBES: Instrumenting a function's call site

2007-09-26 Thread Keshavamurthy, Anil S

On Wed, Sep 26, 2007 at 10:09:33AM +0530, Ananth N Mavinakayanahalli wrote:
> On Tue, Sep 25, 2007 at 06:12:38PM -0400, Avishay Traeger wrote:
> > Hello,
> > I am trying to use kprobes to measure the latency of a function by
> > instrumenting its call site.  Basically, I find the call instruction,
> > and insert a kprobe with a pre-handler and post-handler at that point.
> > The pre-handler measures the latency (reads the TSC counter).  The
> > post-handler measures the latency again, and subtracts the value that
> > was read in the pre-handler to compute the total latency of the called
> > function.
> 
> This sounds ok...

So what you are really measuring is the latency of just that single 
instruction where you have inserted the probe i.e. because your
pre-handler is called just before the probed instruction is executed and
your post-handler is called right after you probed instruction is 
single-stepped.

> 
> > So to measure the latency of foo(), I basically want kprobes to do this:
> > pre_handler();
> > foo();
When you insert a probe, you are inserting  probe on an instruction boundary
and not at function level.
> > post_handler();

Hence the above looks like

pre-handler()
Probed-instruction; // most likely the first instruction in the foo();
post-hanlder()
rest-of-foo()

> > 
> > The problem is that the latencies that I am getting are consistently low
> > (~10,000 cycles).  When I manually instrument the functions, the latency
> > is about 20,000,000 cycles.  Clearly something is not right here.
As I mentioned above what you are seeing is the latency of just the
probed instruction and hence it is very very low compared to
the latency of the function foo().

> You could try a a couple of approaches for starters.
I agree with Ananth, you can try the below approaches
for your measurements.

> 
> a. As you mention above, a kprobe on the function invocation and the
> other on the instruction following the call; both need just pre_handlers. 
> 
> b.
> - Insert a kprobe and a kretprobe on foo()
> - The kprobe needs to have only a pre_handler that'll measure the latency
> - A similar handler for the kretprobe handler can measure the latency
> again and their difference will give you foo()'s latency.
> 
>  though will require you to do some housekeeping in case foo() is
> reentrant to track which return instance corresponds to which call.
> 
> Ananth
> 
> PS: There was a thought of providing a facility to run a handler at
> function entry even when just a kretprobe is used. Maybe we need to
> relook at that; it'd have been useful in this case.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: KPROBES: Instrumenting a function's call site

2007-09-26 Thread Keshavamurthy, Anil S

On Wed, Sep 26, 2007 at 10:09:33AM +0530, Ananth N Mavinakayanahalli wrote:
 On Tue, Sep 25, 2007 at 06:12:38PM -0400, Avishay Traeger wrote:
  Hello,
  I am trying to use kprobes to measure the latency of a function by
  instrumenting its call site.  Basically, I find the call instruction,
  and insert a kprobe with a pre-handler and post-handler at that point.
  The pre-handler measures the latency (reads the TSC counter).  The
  post-handler measures the latency again, and subtracts the value that
  was read in the pre-handler to compute the total latency of the called
  function.
 
 This sounds ok...

So what you are really measuring is the latency of just that single 
instruction where you have inserted the probe i.e. because your
pre-handler is called just before the probed instruction is executed and
your post-handler is called right after you probed instruction is 
single-stepped.

 
  So to measure the latency of foo(), I basically want kprobes to do this:
  pre_handler();
  foo();
When you insert a probe, you are inserting  probe on an instruction boundary
and not at function level.
  post_handler();

Hence the above looks like

pre-handler()
Probed-instruction; // most likely the first instruction in the foo();
post-hanlder()
rest-of-foo()

  
  The problem is that the latencies that I am getting are consistently low
  (~10,000 cycles).  When I manually instrument the functions, the latency
  is about 20,000,000 cycles.  Clearly something is not right here.
As I mentioned above what you are seeing is the latency of just the
probed instruction and hence it is very very low compared to
the latency of the function foo().

 You could try a a couple of approaches for starters.
I agree with Ananth, you can try the below approaches
for your measurements.

 
 a. As you mention above, a kprobe on the function invocation and the
 other on the instruction following the call; both need just pre_handlers. 
 
 b.
 - Insert a kprobe and a kretprobe on foo()
 - The kprobe needs to have only a pre_handler that'll measure the latency
 - A similar handler for the kretprobe handler can measure the latency
 again and their difference will give you foo()'s latency.
 
 b though will require you to do some housekeeping in case foo() is
 reentrant to track which return instance corresponds to which call.
 
 Ananth
 
 PS: There was a thought of providing a facility to run a handler at
 function entry even when just a kretprobe is used. Maybe we need to
 relook at that; it'd have been useful in this case.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch][Intel-IOMMU] Fix for IOMMU early crash

2007-09-25 Thread Keshavamurthy, Anil S

On Sat, Sep 15, 2007 at 02:30:40AM +1000, Paul Mackerras wrote:
> Keshavamurthy, Anil S writes:
> 
> > Can I expect the ppc64 code changes from you? 
> > Once I get your, I will merge with mine and post it again.
> 
> Sure, but it will be next week, since I am travelling this week.
Any progress?

-Anil
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch][Intel-IOMMU] Fix for IOMMU early crash

2007-09-25 Thread Keshavamurthy, Anil S

On Sat, Sep 15, 2007 at 02:30:40AM +1000, Paul Mackerras wrote:
 Keshavamurthy, Anil S writes:
 
  Can I expect the ppc64 code changes from you? 
  Once I get your, I will merge with mine and post it again.
 
 Sure, but it will be next week, since I am travelling this week.
Any progress?

-Anil
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch] Fix BIOS-e820 end address

2007-09-14 Thread Keshavamurthy, Anil S

On Fri, Sep 14, 2007 at 02:31:59PM -0700, Siddha, Suresh B wrote:
> On Fri, Sep 14, 2007 at 02:00:02PM -0700, Jeremy Fitzhardinge wrote:
> > Keshavamurthy, Anil S wrote:
> > > Subject: [patch] Fix BIOS-e820 end address
> > >
> > > --snip of boot message--
> > > BIOS-provided physical RAM map:
> > >  BIOS-e820:  - 000a (usable)
> > >  BIOS-e820: 000f - 0010 (reserved)
> > >  BIOS-e820: 0010 - 7fe8cc00 (usable)
> > > end snip---
> > >
> > > As you see from above the address 0010 is both
> > > shown as reserved and usable which is confusing.
> > >   
> > 
> > I think this is consistent with many other kernel interfaces (such as
> > /proc/X/maps) where the end address is taken to be exclusive: [0xf,
> > 0x10).
> 
> Andrew, Please disregard this patch. As Jermy, Jan pointed out, this
> will cause more confusions. Thanks. 
I agree, we can discard my patch.

-Anil
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch] Fix BIOS-e820 end address

2007-09-14 Thread Keshavamurthy, Anil S

On Fri, Sep 14, 2007 at 02:31:59PM -0700, Siddha, Suresh B wrote:
 On Fri, Sep 14, 2007 at 02:00:02PM -0700, Jeremy Fitzhardinge wrote:
  Keshavamurthy, Anil S wrote:
   Subject: [patch] Fix BIOS-e820 end address
  
   --snip of boot message--
   BIOS-provided physical RAM map:
BIOS-e820:  - 000a (usable)
BIOS-e820: 000f - 0010 (reserved)
BIOS-e820: 0010 - 7fe8cc00 (usable)
   end snip---
  
   As you see from above the address 0010 is both
   shown as reserved and usable which is confusing.
 
  
  I think this is consistent with many other kernel interfaces (such as
  /proc/X/maps) where the end address is taken to be exclusive: [0xf,
  0x10).
 
 Andrew, Please disregard this patch. As Jermy, Jan pointed out, this
 will cause more confusions. Thanks. 
I agree, we can discard my patch.

-Anil
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch][Intel-IOMMU] Fix for IOMMU early crash

2007-09-11 Thread Keshavamurthy, Anil S

On Wed, Sep 12, 2007 at 05:48:52AM +1000, Paul Mackerras wrote:
> Keshavamurthy, Anil S writes:
> 
> > Subject: Fix IOMMU early crash
> > 
> > This patch avoids copying pci_bus's->sysdata to
> > pci_dev's->sysdata as one can easily obtain
> > the same through pci_dev->bus->sysdata.
> 
> At the moment this will cause ppc64 to crash, since we rely on
> pci_dev->sysdata pointing to some node in the firmware device tree,
> either the device's node or the node for a parent bus.
> 
> We could change the ppc64 code to use pci_dev->bus->sysdata in the
> case when pci_dev->sysdata is NULL, which would fix the problem.  I
> think that change should be incorporated as part of this patch so that
> we don't break git bisection.
Can I expect the ppc64 code changes from you? 
Once I get your, I will merge with mine and post it again.
> 
> In other words I don't want to see this patch applied as it stands.
Yup, I agree with you.

-Anil
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch][Intel-IOMMU] Fix for IOMMU early crash

2007-09-11 Thread Keshavamurthy, Anil S

On Wed, Sep 12, 2007 at 05:48:52AM +1000, Paul Mackerras wrote:
> Keshavamurthy, Anil S writes:
> 
> > Subject: Fix IOMMU early crash
> > 
> > This patch avoids copying pci_bus's->sysdata to
> > pci_dev's->sysdata as one can easily obtain
> > the same through pci_dev->bus->sysdata.
> 
> At the moment this will cause ppc64 to crash, since we rely on
> pci_dev->sysdata pointing to some node in the firmware device tree,
> either the device's node or the node for a parent bus.
> 
> We could change the ppc64 code to use pci_dev->bus->sysdata in the
> case when pci_dev->sysdata is NULL, which would fix the problem.  I
> think that change should be incorporated as part of this patch so that
> we don't break git bisection.

Why do you want to check if pci_dev->sysdata is NULL then use
pci_dev->bus->sysdata else pci_dev->sysdata? If you change this
to always use pci_dev->bus->sysdata, then you don;t have to depend
on my patch and your patch can get in independent of mine.

> 
> In other words I don't want to see this patch applied as it stands.
Is it possible to post your patch anytime soon? Or feel free to combine
mine with yours and post it with your signed-off-by.

Thanks,
Anil
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[patch][Intel-IOMMU] Fix for IOMMU early crash

2007-09-11 Thread Keshavamurthy, Anil S

Subject: Fix IOMMU early crash

This patch avoids copying pci_bus's->sysdata to
pci_dev's->sysdata as one can easily obtain
the same through pci_dev->bus->sysdata.

Now with some recent pci_sysdata patches,
the bus's->sysdata gets populated way early
and a value of non-NULL gets copied from
bus's->sysdata to pci_dev's->sysdata
which causes IOMMU to crash way early in the
boot time as IOMMU depends on this field for its
per device IOMMU context.

Hence this patch not only makes pci_dev's->sysdata
useful and but also fixes the IOMMU early crash.

This patch needs to be applied before IOMMU patches,
so that git bisect will not fail on IOMMU code :-)

Tested on x86_64.

Please apply.

Signed-off-by: Anil S Keshavamurthy <[EMAIL PROTECTED]>

---
 drivers/pci/hotplug/fakephp.c |1 -
 drivers/pci/probe.c   |1 -
 2 files changed, 2 deletions(-)

Index: work/drivers/pci/hotplug/fakephp.c
===
--- work.orig/drivers/pci/hotplug/fakephp.c 2007-09-11 10:29:30.0 
-0700
+++ work/drivers/pci/hotplug/fakephp.c  2007-09-11 10:35:22.0 -0700
@@ -243,7 +243,6 @@
return;
 
dev->bus = (struct pci_bus*)bus;
-   dev->sysdata = bus->sysdata;
for (devfn = 0; devfn < 0x100; devfn += 8) {
dev->devfn = devfn;
pci_rescan_slot(dev);
Index: work/drivers/pci/probe.c
===
--- work.orig/drivers/pci/probe.c   2007-09-11 10:29:30.0 -0700
+++ work/drivers/pci/probe.c2007-09-11 10:35:22.0 -0700
@@ -994,7 +994,6 @@
return NULL;
 
dev->bus = bus;
-   dev->sysdata = bus->sysdata;
dev->dev.parent = bus->bridge;
dev->dev.bus = _bus_type;
dev->devfn = devfn;
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[patch][Intel-IOMMU] Fix for IOMMU early crash

2007-09-11 Thread Keshavamurthy, Anil S

Subject: Fix IOMMU early crash

This patch avoids copying pci_bus's-sysdata to
pci_dev's-sysdata as one can easily obtain
the same through pci_dev-bus-sysdata.

Now with some recent pci_sysdata patches,
the bus's-sysdata gets populated way early
and a value of non-NULL gets copied from
bus's-sysdata to pci_dev's-sysdata
which causes IOMMU to crash way early in the
boot time as IOMMU depends on this field for its
per device IOMMU context.

Hence this patch not only makes pci_dev's-sysdata
useful and but also fixes the IOMMU early crash.

This patch needs to be applied before IOMMU patches,
so that git bisect will not fail on IOMMU code :-)

Tested on x86_64.

Please apply.

Signed-off-by: Anil S Keshavamurthy [EMAIL PROTECTED]

---
 drivers/pci/hotplug/fakephp.c |1 -
 drivers/pci/probe.c   |1 -
 2 files changed, 2 deletions(-)

Index: work/drivers/pci/hotplug/fakephp.c
===
--- work.orig/drivers/pci/hotplug/fakephp.c 2007-09-11 10:29:30.0 
-0700
+++ work/drivers/pci/hotplug/fakephp.c  2007-09-11 10:35:22.0 -0700
@@ -243,7 +243,6 @@
return;
 
dev-bus = (struct pci_bus*)bus;
-   dev-sysdata = bus-sysdata;
for (devfn = 0; devfn  0x100; devfn += 8) {
dev-devfn = devfn;
pci_rescan_slot(dev);
Index: work/drivers/pci/probe.c
===
--- work.orig/drivers/pci/probe.c   2007-09-11 10:29:30.0 -0700
+++ work/drivers/pci/probe.c2007-09-11 10:35:22.0 -0700
@@ -994,7 +994,6 @@
return NULL;
 
dev-bus = bus;
-   dev-sysdata = bus-sysdata;
dev-dev.parent = bus-bridge;
dev-dev.bus = pci_bus_type;
dev-devfn = devfn;
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch][Intel-IOMMU] Fix for IOMMU early crash

2007-09-11 Thread Keshavamurthy, Anil S

On Wed, Sep 12, 2007 at 05:48:52AM +1000, Paul Mackerras wrote:
 Keshavamurthy, Anil S writes:
 
  Subject: Fix IOMMU early crash
  
  This patch avoids copying pci_bus's-sysdata to
  pci_dev's-sysdata as one can easily obtain
  the same through pci_dev-bus-sysdata.
 
 At the moment this will cause ppc64 to crash, since we rely on
 pci_dev-sysdata pointing to some node in the firmware device tree,
 either the device's node or the node for a parent bus.
 
 We could change the ppc64 code to use pci_dev-bus-sysdata in the
 case when pci_dev-sysdata is NULL, which would fix the problem.  I
 think that change should be incorporated as part of this patch so that
 we don't break git bisection.

Why do you want to check if pci_dev-sysdata is NULL then use
pci_dev-bus-sysdata else pci_dev-sysdata? If you change this
to always use pci_dev-bus-sysdata, then you don;t have to depend
on my patch and your patch can get in independent of mine.

 
 In other words I don't want to see this patch applied as it stands.
Is it possible to post your patch anytime soon? Or feel free to combine
mine with yours and post it with your signed-off-by.

Thanks,
Anil
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch][Intel-IOMMU] Fix for IOMMU early crash

2007-09-11 Thread Keshavamurthy, Anil S

On Wed, Sep 12, 2007 at 05:48:52AM +1000, Paul Mackerras wrote:
 Keshavamurthy, Anil S writes:
 
  Subject: Fix IOMMU early crash
  
  This patch avoids copying pci_bus's-sysdata to
  pci_dev's-sysdata as one can easily obtain
  the same through pci_dev-bus-sysdata.
 
 At the moment this will cause ppc64 to crash, since we rely on
 pci_dev-sysdata pointing to some node in the firmware device tree,
 either the device's node or the node for a parent bus.
 
 We could change the ppc64 code to use pci_dev-bus-sysdata in the
 case when pci_dev-sysdata is NULL, which would fix the problem.  I
 think that change should be incorporated as part of this patch so that
 we don't break git bisection.
Can I expect the ppc64 code changes from you? 
Once I get your, I will merge with mine and post it again.
 
 In other words I don't want to see this patch applied as it stands.
Yup, I agree with you.

-Anil
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC][Intel-IOMMU] Fix for IOMMU early crash

2007-09-10 Thread Keshavamurthy, Anil S

On Mon, Sep 10, 2007 at 11:25:43PM +0300, Muli Ben-Yehuda wrote:
> On Tue, Sep 11, 2007 at 10:42:31AM -0700, Keshavamurthy, Anil S wrote:
> 
> > Yes, I agree that pci_dev->sysdata can;t be removed. Even we (IOMMU)
> > were dependent on this field but somehow this field is being
> > overwritten to point to pci_bus's->sysdata and hence IOMMU was
> > failing. Earlier it was overwritten to NULL and hence we were not
> > failing but now it is overwritten to non-NULL and hence we fail.
> 
> Do you know which commit caused that change?
> 
> > My therory is that we don;t need to copy pci_bus's->sysdata to 
> > pci_dev's->sysdata. Below patch solves my problem.
> > Any objection to below patch?
> 
> I will give it a spin to verify it works for me, but in general I am
> wary of making such changes unless we can verify (read: audit) that
> they have no adverse side effects *on all architectures*.
Thanks Muli for your help here. I tested on x86_64 and saw no
issues. Looking at the code, pci_dev's->sysdata becomes useless 
if the intent here is to keep a copy of it's bus's->sysdata as
the same can be obtained from pci_dev->bus->sysdata.

Thanks,
Anil
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC][Intel-IOMMU] Fix for IOMMU early crash

2007-09-10 Thread Keshavamurthy, Anil S

On Mon, Sep 10, 2007 at 03:37:48AM +1000, Paul Mackerras wrote:
> Keshavamurthy, Anil S writes:
> 
> > Subject: [RFC][Intel-IOMMU] Fix for IOMMU early crash
> > 
> > Populating pci_bus->sysdata way early in the pci discovery phase
> > sets NON-NULL value to pci_dev->sysdata which breaks the assumption
> > in the Intel IOMMU driver and crashes the system.
> > 
> > 
> > In the drivers/pci/probe.c, pci_dev->sysdata gets a copy of
> > its pci_bus->sysdata which is not required as
> > the same can be obtained from pci_dev->bus->sysdata. More over 
> > the left hand assignment of pci_dev->sysdata is never being used,
> 
> Wrong.  You needed to grep a bit more widely...
Ah..Thanks for pointing this out. sorry I had checked only i386 and x86_64.
> 
> > so their is no point is setting 
> > pci_dev->sysdata = pci_bus->sysdata;
> > 
> > This patch removes sysdata from pci_dev struct and creates a new
> > field called sys_data which is exclusively used 
> > by IOMMU driver to keep its per device context pointer.
> 
> This will break powerpc, because we use the pci_dev->sysdata field to
> point to a firmware device tree node.  Please figure out another way
> to solve your problem.

Yes, I agree that pci_dev->sysdata can;t be removed. Even we (IOMMU)
were dependent on this field but somehow this field is being overwritten
to point to pci_bus's->sysdata and hence IOMMU was failing. Earlier
it was overwritten to NULL and hence we were not failing but now it
is overwritten to non-NULL and hence we fail.

My therory is that we don;t need to copy pci_bus's->sysdata to 
pci_dev's->sysdata. Below patch solves my problem.
Any objection to below patch?

---
 drivers/pci/hotplug/fakephp.c |1 -
 drivers/pci/probe.c   |1 -
 2 files changed, 2 deletions(-)

Index: work/drivers/pci/hotplug/fakephp.c
===
--- work.orig/drivers/pci/hotplug/fakephp.c 2007-09-11 10:29:30.0 
-0700
+++ work/drivers/pci/hotplug/fakephp.c  2007-09-11 10:35:22.0 -0700
@@ -243,7 +243,6 @@
return;
 
dev->bus = (struct pci_bus*)bus;
-   dev->sysdata = bus->sysdata;
for (devfn = 0; devfn < 0x100; devfn += 8) {
dev->devfn = devfn;
pci_rescan_slot(dev);
Index: work/drivers/pci/probe.c
===
--- work.orig/drivers/pci/probe.c   2007-09-11 10:29:30.0 -0700
+++ work/drivers/pci/probe.c2007-09-11 10:35:22.0 -0700
@@ -994,7 +994,6 @@
return NULL;
 
dev->bus = bus;
-   dev->sysdata = bus->sysdata;
dev->dev.parent = bus->bridge;
dev->dev.bus = _bus_type;
dev->devfn = devfn;



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC][Intel-IOMMU] Fix for IOMMU early crash

2007-09-10 Thread Keshavamurthy, Anil S

On Sun, Sep 09, 2007 at 08:51:40PM +0300, Muli Ben-Yehuda wrote:
> On Mon, Sep 10, 2007 at 08:43:59AM -0700, Keshavamurthy, Anil S wrote:
> > On Sun, Sep 09, 2007 at 02:16:19PM +0300, Muli Ben-Yehuda wrote:
> > > On Sat, Sep 08, 2007 at 01:05:24PM -0700, Keshavamurthy, Anil S wrote:
> > > 
> > > > Subject: [RFC][Intel-IOMMU] Fix for IOMMU early crash
> > > 
> > > This patch feels like a huge hack. See below.
> >
> > You seem to be jumping to conclusion without going in detail. The
> > pci_dev struct contains pointer to sysdata, which in turn points to
> > the copy of its parent's bus sysdata.  So technically speaking we
> > can eliminate sysdata pointer from pci_dev struct which is what one
> > portion of this patch does.
> 
> ... provided nothing relies on this relationship or the existence of
> the pci_dev's sysdata. Have you audited every architecture's use of
> the sysdata pointers?
> 
> > > > This patch removes sysdata from pci_dev struct and creates a new
> > > > field called sys_data which is exclusively used by IOMMU driver to
> > > > keep its per device context pointer.
> > > 
> > > Hmpf, why is this needed? with the pci_sysdata work that recently went
> > > into mainline we have a void *iommu member in pci_sysdata which should
> > > be all that's needed. Please elaborate if it's not enough for your
> > > needs.
> 
> > I looked at your patch and it was not suitable because I need to
> > store iommu private pointer in pci_dev
> 
> Could you elaborate on why you need this? I'm assuming it's for the
> per-device IOMMU page tables?
Yes, it is for per-device IOMMU domain information which internally holds
info about the page tables.
> 
> > and not in the pci_bus. So I have added a new member sys_data in the
> > pci_dev struct. I can change the name from sys_dev to iomu_priv to
> > clear the confusion. Do let me know.
> 
> Well, you should be able to just use the pci_dev's ->sysdata (that's
> what it's there for after all!) but you might need to make it point to
> a structure if it's shared, the same way we did with the bus's
> ->sysdata. 
How about adding a new field as I don't care about KABI for mainline?

>I agree that just having it point to the bus's ->sysdata is
> not very useful *but* there may be code in the kernel that relies on
> it (Calgary did until very recently...) so it would have to be audited
> first.
I still wonder copying bus's->sysdata to pci_dev's-> sysdata is any useful?

-Anil
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC][Intel-IOMMU] Fix for IOMMU early crash

2007-09-10 Thread Keshavamurthy, Anil S

On Sun, Sep 09, 2007 at 08:51:40PM +0300, Muli Ben-Yehuda wrote:
 On Mon, Sep 10, 2007 at 08:43:59AM -0700, Keshavamurthy, Anil S wrote:
  On Sun, Sep 09, 2007 at 02:16:19PM +0300, Muli Ben-Yehuda wrote:
   On Sat, Sep 08, 2007 at 01:05:24PM -0700, Keshavamurthy, Anil S wrote:
   
Subject: [RFC][Intel-IOMMU] Fix for IOMMU early crash
   
   This patch feels like a huge hack. See below.
 
  You seem to be jumping to conclusion without going in detail. The
  pci_dev struct contains pointer to sysdata, which in turn points to
  the copy of its parent's bus sysdata.  So technically speaking we
  can eliminate sysdata pointer from pci_dev struct which is what one
  portion of this patch does.
 
 ... provided nothing relies on this relationship or the existence of
 the pci_dev's sysdata. Have you audited every architecture's use of
 the sysdata pointers?
 
This patch removes sysdata from pci_dev struct and creates a new
field called sys_data which is exclusively used by IOMMU driver to
keep its per device context pointer.
   
   Hmpf, why is this needed? with the pci_sysdata work that recently went
   into mainline we have a void *iommu member in pci_sysdata which should
   be all that's needed. Please elaborate if it's not enough for your
   needs.
 
  I looked at your patch and it was not suitable because I need to
  store iommu private pointer in pci_dev
 
 Could you elaborate on why you need this? I'm assuming it's for the
 per-device IOMMU page tables?
Yes, it is for per-device IOMMU domain information which internally holds
info about the page tables.
 
  and not in the pci_bus. So I have added a new member sys_data in the
  pci_dev struct. I can change the name from sys_dev to iomu_priv to
  clear the confusion. Do let me know.
 
 Well, you should be able to just use the pci_dev's -sysdata (that's
 what it's there for after all!) but you might need to make it point to
 a structure if it's shared, the same way we did with the bus's
 -sysdata. 
How about adding a new field as I don't care about KABI for mainline?

I agree that just having it point to the bus's -sysdata is
 not very useful *but* there may be code in the kernel that relies on
 it (Calgary did until very recently...) so it would have to be audited
 first.
I still wonder copying bus's-sysdata to pci_dev's- sysdata is any useful?

-Anil
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC][Intel-IOMMU] Fix for IOMMU early crash

2007-09-10 Thread Keshavamurthy, Anil S

On Mon, Sep 10, 2007 at 03:37:48AM +1000, Paul Mackerras wrote:
 Keshavamurthy, Anil S writes:
 
  Subject: [RFC][Intel-IOMMU] Fix for IOMMU early crash
  
  Populating pci_bus-sysdata way early in the pci discovery phase
  sets NON-NULL value to pci_dev-sysdata which breaks the assumption
  in the Intel IOMMU driver and crashes the system.
  
  
  In the drivers/pci/probe.c, pci_dev-sysdata gets a copy of
  its pci_bus-sysdata which is not required as
  the same can be obtained from pci_dev-bus-sysdata. More over 
  the left hand assignment of pci_dev-sysdata is never being used,
 
 Wrong.  You needed to grep a bit more widely...
Ah..Thanks for pointing this out. sorry I had checked only i386 and x86_64.
 
  so their is no point is setting 
  pci_dev-sysdata = pci_bus-sysdata;
  
  This patch removes sysdata from pci_dev struct and creates a new
  field called sys_data which is exclusively used 
  by IOMMU driver to keep its per device context pointer.
 
 This will break powerpc, because we use the pci_dev-sysdata field to
 point to a firmware device tree node.  Please figure out another way
 to solve your problem.

Yes, I agree that pci_dev-sysdata can;t be removed. Even we (IOMMU)
were dependent on this field but somehow this field is being overwritten
to point to pci_bus's-sysdata and hence IOMMU was failing. Earlier
it was overwritten to NULL and hence we were not failing but now it
is overwritten to non-NULL and hence we fail.

My therory is that we don;t need to copy pci_bus's-sysdata to 
pci_dev's-sysdata. Below patch solves my problem.
Any objection to below patch?

---
 drivers/pci/hotplug/fakephp.c |1 -
 drivers/pci/probe.c   |1 -
 2 files changed, 2 deletions(-)

Index: work/drivers/pci/hotplug/fakephp.c
===
--- work.orig/drivers/pci/hotplug/fakephp.c 2007-09-11 10:29:30.0 
-0700
+++ work/drivers/pci/hotplug/fakephp.c  2007-09-11 10:35:22.0 -0700
@@ -243,7 +243,6 @@
return;
 
dev-bus = (struct pci_bus*)bus;
-   dev-sysdata = bus-sysdata;
for (devfn = 0; devfn  0x100; devfn += 8) {
dev-devfn = devfn;
pci_rescan_slot(dev);
Index: work/drivers/pci/probe.c
===
--- work.orig/drivers/pci/probe.c   2007-09-11 10:29:30.0 -0700
+++ work/drivers/pci/probe.c2007-09-11 10:35:22.0 -0700
@@ -994,7 +994,6 @@
return NULL;
 
dev-bus = bus;
-   dev-sysdata = bus-sysdata;
dev-dev.parent = bus-bridge;
dev-dev.bus = pci_bus_type;
dev-devfn = devfn;



-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC][Intel-IOMMU] Fix for IOMMU early crash

2007-09-10 Thread Keshavamurthy, Anil S

On Mon, Sep 10, 2007 at 11:25:43PM +0300, Muli Ben-Yehuda wrote:
 On Tue, Sep 11, 2007 at 10:42:31AM -0700, Keshavamurthy, Anil S wrote:
 
  Yes, I agree that pci_dev-sysdata can;t be removed. Even we (IOMMU)
  were dependent on this field but somehow this field is being
  overwritten to point to pci_bus's-sysdata and hence IOMMU was
  failing. Earlier it was overwritten to NULL and hence we were not
  failing but now it is overwritten to non-NULL and hence we fail.
 
 Do you know which commit caused that change?
 
  My therory is that we don;t need to copy pci_bus's-sysdata to 
  pci_dev's-sysdata. Below patch solves my problem.
  Any objection to below patch?
 
 I will give it a spin to verify it works for me, but in general I am
 wary of making such changes unless we can verify (read: audit) that
 they have no adverse side effects *on all architectures*.
Thanks Muli for your help here. I tested on x86_64 and saw no
issues. Looking at the code, pci_dev's-sysdata becomes useless 
if the intent here is to keep a copy of it's bus's-sysdata as
the same can be obtained from pci_dev-bus-sysdata.

Thanks,
Anil
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC][Intel-IOMMU] Fix for IOMMU early crash

2007-09-09 Thread Keshavamurthy, Anil S

On Sun, Sep 09, 2007 at 02:16:19PM +0300, Muli Ben-Yehuda wrote:
> On Sat, Sep 08, 2007 at 01:05:24PM -0700, Keshavamurthy, Anil S wrote:
> 
> > Subject: [RFC][Intel-IOMMU] Fix for IOMMU early crash
> 
> This patch feels like a huge hack. See below.
You seem to be jumping to conclusion without going
in detail. The pci_dev struct contains pointer to sysdata,
which in turn points to the copy of its parent's bus sysdata.
So technically speaking we can eliminate sysdata pointer from
pci_dev struct which is what one portion of this patch does.

> 
> > This patch removes sysdata from pci_dev struct and creates a new
> > field called sys_data which is exclusively used by IOMMU driver to
> > keep its per device context pointer.
> 
> Hmpf, why is this needed? with the pci_sysdata work that recently went
> into mainline we have a void *iommu member in pci_sysdata which should
> be all that's needed. Please elaborate if it's not enough for your
> needs.
I looked at your patch and it was not suitable because I need to store
iommu private pointer in pci_dev and not in the pci_bus. So I have added
a new member sys_data in the pci_dev struct. I can change the name from 
sys_dev to iomu_priv to clear the confusion. Do let me know.


-Anil






> 
> Thanks,
> Muli
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC][Intel-IOMMU] Fix for IOMMU early crash

2007-09-09 Thread Keshavamurthy, Anil S

On Sun, Sep 09, 2007 at 02:16:19PM +0300, Muli Ben-Yehuda wrote:
 On Sat, Sep 08, 2007 at 01:05:24PM -0700, Keshavamurthy, Anil S wrote:
 
  Subject: [RFC][Intel-IOMMU] Fix for IOMMU early crash
 
 This patch feels like a huge hack. See below.
You seem to be jumping to conclusion without going
in detail. The pci_dev struct contains pointer to sysdata,
which in turn points to the copy of its parent's bus sysdata.
So technically speaking we can eliminate sysdata pointer from
pci_dev struct which is what one portion of this patch does.

 
  This patch removes sysdata from pci_dev struct and creates a new
  field called sys_data which is exclusively used by IOMMU driver to
  keep its per device context pointer.
 
 Hmpf, why is this needed? with the pci_sysdata work that recently went
 into mainline we have a void *iommu member in pci_sysdata which should
 be all that's needed. Please elaborate if it's not enough for your
 needs.
I looked at your patch and it was not suitable because I need to store
iommu private pointer in pci_dev and not in the pci_bus. So I have added
a new member sys_data in the pci_dev struct. I can change the name from 
sys_dev to iomu_priv to clear the confusion. Do let me know.


-Anil






 
 Thanks,
 Muli
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[patch] Fix BIOS-e820 end address

2007-09-07 Thread Keshavamurthy, Anil S

Subject: [patch] Fix BIOS-e820 end address

--snip of boot message--
BIOS-provided physical RAM map:
 BIOS-e820:  - 000a (usable)
 BIOS-e820: 000f - 0010 (reserved)
 BIOS-e820: 0010 - 7fe8cc00 (usable)
end snip---

As you see from above the address 0010 is both
shown as reserved and usable which is confusing.

This patch fixes the BIOS-e820 end address.

Signed-off-by: Anil S Keshavamurthy <[EMAIL PROTECTED]>

---
 arch/i386/kernel/e820.c   |2 +-
 arch/x86_64/kernel/e820.c |2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

Index: work/arch/i386/kernel/e820.c
===
--- work.orig/arch/i386/kernel/e820.c   2007-09-08 12:00:33.0 -0700
+++ work/arch/i386/kernel/e820.c2007-09-08 13:39:12.0 -0700
@@ -753,7 +753,7 @@
for (i = 0; i < e820.nr_map; i++) {
printk(" %s: %016Lx - %016Lx ", who,
e820.map[i].addr,
-   e820.map[i].addr + e820.map[i].size);
+   e820.map[i].addr + e820.map[i].size - 1);
switch (e820.map[i].type) {
case E820_RAM:  printk("(usable)\n");
break;
Index: work/arch/x86_64/kernel/e820.c
===
--- work.orig/arch/x86_64/kernel/e820.c 2007-09-08 12:00:46.0 -0700
+++ work/arch/x86_64/kernel/e820.c  2007-09-08 13:38:57.0 -0700
@@ -368,7 +368,7 @@
for (i = 0; i < e820.nr_map; i++) {
printk(KERN_INFO " %s: %016Lx - %016Lx ", who,
(unsigned long long) e820.map[i].addr,
-   (unsigned long long) (e820.map[i].addr + 
e820.map[i].size));
+   (unsigned long long) (e820.map[i].addr + 
e820.map[i].size - 1));
switch (e820.map[i].type) {
case E820_RAM:  printk("(usable)\n");
break;
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[RFC][Intel-IOMMU] Fix for IOMMU early crash

2007-09-07 Thread Keshavamurthy, Anil S

Subject: [RFC][Intel-IOMMU] Fix for IOMMU early crash

Populating pci_bus->sysdata way early in the pci discovery phase
sets NON-NULL value to pci_dev->sysdata which breaks the assumption
in the Intel IOMMU driver and crashes the system.


In the drivers/pci/probe.c, pci_dev->sysdata gets a copy of
its pci_bus->sysdata which is not required as
the same can be obtained from pci_dev->bus->sysdata. More over 
the left hand assignment of pci_dev->sysdata is never being used,
so their is no point is setting 
pci_dev->sysdata = pci_bus->sysdata;

This patch removes sysdata from pci_dev struct and creates a new
field called sys_data which is exclusively used 
by IOMMU driver to keep its per device context pointer.

Signed-off-by: Anil S Keshavamurthy <[EMAIL PROTECTED]>

---
 drivers/pci/hotplug/fakephp.c |1 -
 drivers/pci/intel-iommu.c |   22 +++---
 drivers/pci/probe.c   |1 -
 include/linux/pci.h   |2 +-
 4 files changed, 12 insertions(+), 14 deletions(-)

Index: work/drivers/pci/hotplug/fakephp.c
===
--- work.orig/drivers/pci/hotplug/fakephp.c 2007-09-08 12:00:20.0 
-0700
+++ work/drivers/pci/hotplug/fakephp.c  2007-09-08 12:07:19.0 -0700
@@ -243,7 +243,6 @@
return;
 
dev->bus = (struct pci_bus*)bus;
-   dev->sysdata = bus->sysdata;
for (devfn = 0; devfn < 0x100; devfn += 8) {
dev->devfn = devfn;
pci_rescan_slot(dev);
Index: work/drivers/pci/intel-iommu.c
===
--- work.orig/drivers/pci/intel-iommu.c 2007-09-08 12:00:47.0 -0700
+++ work/drivers/pci/intel-iommu.c  2007-09-08 12:08:20.0 -0700
@@ -1348,7 +1348,7 @@
list_del(>link);
list_del(>global);
if (info->dev)
-   info->dev->sysdata = NULL;
+   info->dev->sys_data = NULL;
spin_unlock_irqrestore(_domain_lock, flags);
 
detach_domain_for_dev(info->domain, info->bus, info->devfn);
@@ -1361,7 +1361,7 @@
 
 /*
  * find_domain
- * Note: we use struct pci_dev->sysdata stores the info
+ * Note: we use struct pci_dev->sys_data stores the info
  */
 struct dmar_domain *
 find_domain(struct pci_dev *pdev)
@@ -1369,7 +1369,7 @@
struct device_domain_info *info;
 
/* No lock here, assumes no domain exit in normal case */
-   info = pdev->sysdata;
+   info = pdev->sys_data;
if (info)
return info->domain;
return NULL;
@@ -1519,7 +1519,7 @@
}
list_add(>link, >devices);
list_add(>global, _domain_list);
-   pdev->sysdata = info;
+   pdev->sys_data = info;
spin_unlock_irqrestore(_domain_lock, flags);
return domain;
 error:
@@ -1579,7 +1579,7 @@
 static inline int iommu_prepare_rmrr_dev(struct dmar_rmrr_unit *rmrr,
struct pci_dev *pdev)
 {
-   if (pdev->sysdata == DUMMY_DEVICE_DOMAIN_INFO)
+   if (pdev->sys_data == DUMMY_DEVICE_DOMAIN_INFO)
return 0;
return iommu_prepare_identity_map(pdev, rmrr->base_address,
rmrr->end_address + 1);
@@ -1595,7 +1595,7 @@
int ret;
 
for_each_pci_dev(pdev) {
-   if (pdev->sysdata == DUMMY_DEVICE_DOMAIN_INFO ||
+   if (pdev->sys_data == DUMMY_DEVICE_DOMAIN_INFO ||
!IS_GFX_DEVICE(pdev))
continue;
printk(KERN_INFO "IOMMU: gfx device %s 1-1 mapping\n",
@@ -1836,7 +1836,7 @@
int prot = 0;
 
BUG_ON(dir == DMA_NONE);
-   if (pdev->sysdata == DUMMY_DEVICE_DOMAIN_INFO)
+   if (pdev->sys_data == DUMMY_DEVICE_DOMAIN_INFO)
return virt_to_bus(addr);
 
domain = get_valid_domain_for_dev(pdev);
@@ -1900,7 +1900,7 @@
unsigned long start_addr;
struct iova *iova;
 
-   if (pdev->sysdata == DUMMY_DEVICE_DOMAIN_INFO)
+   if (pdev->sys_data == DUMMY_DEVICE_DOMAIN_INFO)
return;
domain = find_domain(pdev);
BUG_ON(!domain);
@@ -1974,7 +1974,7 @@
size_t size = 0;
void *addr;
 
-   if (pdev->sysdata == DUMMY_DEVICE_DOMAIN_INFO)
+   if (pdev->sys_data == DUMMY_DEVICE_DOMAIN_INFO)
return;
 
domain = find_domain(pdev);
@@ -2032,7 +2032,7 @@
unsigned long start_addr;
 
BUG_ON(dir == DMA_NONE);
-   if (pdev->sysdata == DUMMY_DEVICE_DOMAIN_INFO)
+   if (pdev->sys_data == DUMMY_DEVICE_DOMAIN_INFO)
return intel_nontranslate_map_sg(hwdev, sg, nelems, dir);
 
domain = get_valid_domain_for_dev(pdev);
@@ -2234,7 +2234,7 @@
for (i = 0; i < drhd->devices_cnt; i++) {
if (!drhd->devices[i])
continue;
-   drhd->devices[i]->sysdata =

[RFC][Intel-IOMMU] Fix for IOMMU early crash

2007-09-07 Thread Keshavamurthy, Anil S

Subject: [RFC][Intel-IOMMU] Fix for IOMMU early crash

Populating pci_bus-sysdata way early in the pci discovery phase
sets NON-NULL value to pci_dev-sysdata which breaks the assumption
in the Intel IOMMU driver and crashes the system.


In the drivers/pci/probe.c, pci_dev-sysdata gets a copy of
its pci_bus-sysdata which is not required as
the same can be obtained from pci_dev-bus-sysdata. More over 
the left hand assignment of pci_dev-sysdata is never being used,
so their is no point is setting 
pci_dev-sysdata = pci_bus-sysdata;

This patch removes sysdata from pci_dev struct and creates a new
field called sys_data which is exclusively used 
by IOMMU driver to keep its per device context pointer.

Signed-off-by: Anil S Keshavamurthy [EMAIL PROTECTED]

---
 drivers/pci/hotplug/fakephp.c |1 -
 drivers/pci/intel-iommu.c |   22 +++---
 drivers/pci/probe.c   |1 -
 include/linux/pci.h   |2 +-
 4 files changed, 12 insertions(+), 14 deletions(-)

Index: work/drivers/pci/hotplug/fakephp.c
===
--- work.orig/drivers/pci/hotplug/fakephp.c 2007-09-08 12:00:20.0 
-0700
+++ work/drivers/pci/hotplug/fakephp.c  2007-09-08 12:07:19.0 -0700
@@ -243,7 +243,6 @@
return;
 
dev-bus = (struct pci_bus*)bus;
-   dev-sysdata = bus-sysdata;
for (devfn = 0; devfn  0x100; devfn += 8) {
dev-devfn = devfn;
pci_rescan_slot(dev);
Index: work/drivers/pci/intel-iommu.c
===
--- work.orig/drivers/pci/intel-iommu.c 2007-09-08 12:00:47.0 -0700
+++ work/drivers/pci/intel-iommu.c  2007-09-08 12:08:20.0 -0700
@@ -1348,7 +1348,7 @@
list_del(info-link);
list_del(info-global);
if (info-dev)
-   info-dev-sysdata = NULL;
+   info-dev-sys_data = NULL;
spin_unlock_irqrestore(device_domain_lock, flags);
 
detach_domain_for_dev(info-domain, info-bus, info-devfn);
@@ -1361,7 +1361,7 @@
 
 /*
  * find_domain
- * Note: we use struct pci_dev-sysdata stores the info
+ * Note: we use struct pci_dev-sys_data stores the info
  */
 struct dmar_domain *
 find_domain(struct pci_dev *pdev)
@@ -1369,7 +1369,7 @@
struct device_domain_info *info;
 
/* No lock here, assumes no domain exit in normal case */
-   info = pdev-sysdata;
+   info = pdev-sys_data;
if (info)
return info-domain;
return NULL;
@@ -1519,7 +1519,7 @@
}
list_add(info-link, domain-devices);
list_add(info-global, device_domain_list);
-   pdev-sysdata = info;
+   pdev-sys_data = info;
spin_unlock_irqrestore(device_domain_lock, flags);
return domain;
 error:
@@ -1579,7 +1579,7 @@
 static inline int iommu_prepare_rmrr_dev(struct dmar_rmrr_unit *rmrr,
struct pci_dev *pdev)
 {
-   if (pdev-sysdata == DUMMY_DEVICE_DOMAIN_INFO)
+   if (pdev-sys_data == DUMMY_DEVICE_DOMAIN_INFO)
return 0;
return iommu_prepare_identity_map(pdev, rmrr-base_address,
rmrr-end_address + 1);
@@ -1595,7 +1595,7 @@
int ret;
 
for_each_pci_dev(pdev) {
-   if (pdev-sysdata == DUMMY_DEVICE_DOMAIN_INFO ||
+   if (pdev-sys_data == DUMMY_DEVICE_DOMAIN_INFO ||
!IS_GFX_DEVICE(pdev))
continue;
printk(KERN_INFO IOMMU: gfx device %s 1-1 mapping\n,
@@ -1836,7 +1836,7 @@
int prot = 0;
 
BUG_ON(dir == DMA_NONE);
-   if (pdev-sysdata == DUMMY_DEVICE_DOMAIN_INFO)
+   if (pdev-sys_data == DUMMY_DEVICE_DOMAIN_INFO)
return virt_to_bus(addr);
 
domain = get_valid_domain_for_dev(pdev);
@@ -1900,7 +1900,7 @@
unsigned long start_addr;
struct iova *iova;
 
-   if (pdev-sysdata == DUMMY_DEVICE_DOMAIN_INFO)
+   if (pdev-sys_data == DUMMY_DEVICE_DOMAIN_INFO)
return;
domain = find_domain(pdev);
BUG_ON(!domain);
@@ -1974,7 +1974,7 @@
size_t size = 0;
void *addr;
 
-   if (pdev-sysdata == DUMMY_DEVICE_DOMAIN_INFO)
+   if (pdev-sys_data == DUMMY_DEVICE_DOMAIN_INFO)
return;
 
domain = find_domain(pdev);
@@ -2032,7 +2032,7 @@
unsigned long start_addr;
 
BUG_ON(dir == DMA_NONE);
-   if (pdev-sysdata == DUMMY_DEVICE_DOMAIN_INFO)
+   if (pdev-sys_data == DUMMY_DEVICE_DOMAIN_INFO)
return intel_nontranslate_map_sg(hwdev, sg, nelems, dir);
 
domain = get_valid_domain_for_dev(pdev);
@@ -2234,7 +2234,7 @@
for (i = 0; i  drhd-devices_cnt; i++) {
if (!drhd-devices[i])
continue;
-   drhd-devices[i]-sysdata =

[patch] Fix BIOS-e820 end address

2007-09-07 Thread Keshavamurthy, Anil S

Subject: [patch] Fix BIOS-e820 end address

--snip of boot message--
BIOS-provided physical RAM map:
 BIOS-e820:  - 000a (usable)
 BIOS-e820: 000f - 0010 (reserved)
 BIOS-e820: 0010 - 7fe8cc00 (usable)
end snip---

As you see from above the address 0010 is both
shown as reserved and usable which is confusing.

This patch fixes the BIOS-e820 end address.

Signed-off-by: Anil S Keshavamurthy [EMAIL PROTECTED]

---
 arch/i386/kernel/e820.c   |2 +-
 arch/x86_64/kernel/e820.c |2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

Index: work/arch/i386/kernel/e820.c
===
--- work.orig/arch/i386/kernel/e820.c   2007-09-08 12:00:33.0 -0700
+++ work/arch/i386/kernel/e820.c2007-09-08 13:39:12.0 -0700
@@ -753,7 +753,7 @@
for (i = 0; i  e820.nr_map; i++) {
printk( %s: %016Lx - %016Lx , who,
e820.map[i].addr,
-   e820.map[i].addr + e820.map[i].size);
+   e820.map[i].addr + e820.map[i].size - 1);
switch (e820.map[i].type) {
case E820_RAM:  printk((usable)\n);
break;
Index: work/arch/x86_64/kernel/e820.c
===
--- work.orig/arch/x86_64/kernel/e820.c 2007-09-08 12:00:46.0 -0700
+++ work/arch/x86_64/kernel/e820.c  2007-09-08 13:38:57.0 -0700
@@ -368,7 +368,7 @@
for (i = 0; i  e820.nr_map; i++) {
printk(KERN_INFO  %s: %016Lx - %016Lx , who,
(unsigned long long) e820.map[i].addr,
-   (unsigned long long) (e820.map[i].addr + 
e820.map[i].size));
+   (unsigned long long) (e820.map[i].addr + 
e820.map[i].size - 1));
switch (e820.map[i].type) {
case E820_RAM:  printk((usable)\n);
break;
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch -mm][Intel-IOMMU] Optimize sg map/unmap calls

2007-08-02 Thread Keshavamurthy, Anil S

On Wed, Aug 01, 2007 at 04:45:54PM -0700, Andrew Morton wrote:
> On Wed, 1 Aug 2007 13:06:23 -0700
> "Keshavamurthy, Anil S" <[EMAIL PROTECTED]> wrote:
> 
> > +/* Computes the padding size required, to make the
> > + * the start address naturally aligned on its size
> > + */
> > +static int
> > +iova_get_pad_size(int size, unsigned int limit_pfn)
> > +{
> > +   unsigned int pad_size = 0;
> > +   unsigned int order = ilog2(size);
> > +
> > +   if (order)
> > +   pad_size = (limit_pfn + 1) % (1 << order);
> > +
> > +   return pad_size;
> > +}
> 
> This isn't obviously doing the right thing for non-power-of-2 inputs.
> ilog2() rounds down...
Andrew,
The call chain to iova_get_pad_size() is like this
alloc_iova()--->__alloc_iova_range()--->iova_get_pad_size().

Inside the alloc_iova() we are rounding the size to __roundup_pow_of_two(size)
iff the caller of alloc_iova() request by setting size_aligned bool. And
in every call to iova_get_pad_size() we check the size_aligned bool before 
calling
iova_get_pad_size.  Hence I don;t see any issues. If you want I can insert a 
BUG_ON() statement inside the above iova_get_pad_size() function. 
Please do let me know.

thanks,
Anil
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch -mm][Intel-IOMMU] Optimize sg map/unmap calls

2007-08-02 Thread Keshavamurthy, Anil S

On Wed, Aug 01, 2007 at 04:45:54PM -0700, Andrew Morton wrote:
 On Wed, 1 Aug 2007 13:06:23 -0700
 Keshavamurthy, Anil S [EMAIL PROTECTED] wrote:
 
  +/* Computes the padding size required, to make the
  + * the start address naturally aligned on its size
  + */
  +static int
  +iova_get_pad_size(int size, unsigned int limit_pfn)
  +{
  +   unsigned int pad_size = 0;
  +   unsigned int order = ilog2(size);
  +
  +   if (order)
  +   pad_size = (limit_pfn + 1) % (1  order);
  +
  +   return pad_size;
  +}
 
 This isn't obviously doing the right thing for non-power-of-2 inputs.
 ilog2() rounds down...
Andrew,
The call chain to iova_get_pad_size() is like this
alloc_iova()---__alloc_iova_range()---iova_get_pad_size().

Inside the alloc_iova() we are rounding the size to __roundup_pow_of_two(size)
iff the caller of alloc_iova() request by setting size_aligned bool. And
in every call to iova_get_pad_size() we check the size_aligned bool before 
calling
iova_get_pad_size.  Hence I don;t see any issues. If you want I can insert a 
BUG_ON() statement inside the above iova_get_pad_size() function. 
Please do let me know.

thanks,
Anil
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[patch -mm][Intel-IOMMU] Optimize sg map/unmap calls

2007-08-01 Thread Keshavamurthy, Anil S

This patch adds PageSelectiveInvalidation support 
replacing existing DomainSelectiveInvalidation for 
intel_{map/unmap}_sg() calls and also enables
to mapping one big contiguous DMA virtual address
which is mapped to discontiguous physical address
for SG map/unmap calls.


"Doamin selective invalidations" wipes out the IOMMU address
translation cache based on domain ID where as "Page selective
invalidations" wipes out the IOMMU address translation cache for
that address mask range which is more cache friendly when
compared to Domain selective invalidations.


Here is how it is done.
1) changes to iova.c
alloc_iova() now takes a bool size_aligned argument, which
when when set, returns the io virtual address that is 
naturally aligned to 2 ^ x, where x is the order 
of the size requested.

Returning this io vitual address which is naturally 
aligned helps iommu to do the "page selective 
invalidations" which is IOMMU cache friendly
over "domain selective invalidations".

2) Changes to driver/pci/intel-iommu.c
Clean up intel_{map/unmap}_{single/sg} () calls so that
s/g map/unamp calls is no more dependent on 
intel_{map/unmap}_single()

intel_map_sg() now computes the total DMA virtual address
required and allocates the size aligned total DMA virtual address
and maps the discontiguous physical address to the allocated
contiguous DMA virtual address.

In the intel_unmap_sg() case since the DMA virtual address
is contiguous and size_aligned, PageSelectiveInvalidation
is used replacing earlier DomainSelectiveInvalidations.

Andrew,
Please add this patch to you mm queue.

Signed-off-by: Anil S Keshavamurthy <[EMAIL PROTECTED]>

---
 drivers/pci/intel-iommu.c |  325 +-
 drivers/pci/iova.c|   63 +++-
 drivers/pci/iova.h|3 
 3 files changed, 231 insertions(+), 160 deletions(-)

Index: work/drivers/pci/intel-iommu.c
===
--- work.orig/drivers/pci/intel-iommu.c 2007-08-01 11:18:12.0 -0700
+++ work/drivers/pci/intel-iommu.c  2007-08-01 11:18:31.0 -0700
@@ -665,24 +665,10 @@
non_present_entry_flush);
 }
 
-static int iommu_get_alignment(u64 base, unsigned int size)
-{
-   int t = 0;
-   u64 end;
-
-   end = base + size - 1;
-   while (base != end) {
-   t++;
-   base >>= 1;
-   end >>= 1;
-   }
-   return t;
-}
-
 static int iommu_flush_iotlb_psi(struct intel_iommu *iommu, u16 did,
u64 addr, unsigned int pages, int non_present_entry_flush)
 {
-   unsigned int align;
+   unsigned int mask;
 
BUG_ON(addr & (~PAGE_MASK_4K));
BUG_ON(pages == 0);
@@ -696,16 +682,13 @@
 * PSI requires page size to be 2 ^ x, and the base address is naturally
 * aligned to the size
 */
-   align = iommu_get_alignment(addr >> PAGE_SHIFT_4K, pages);
+   mask = ilog2(__roundup_pow_of_two(pages));
/* Fallback to domain selective flush if size is too big */
-   if (align > cap_max_amask_val(iommu->cap))
+   if (mask > cap_max_amask_val(iommu->cap))
return iommu_flush_iotlb_dsi(iommu, did,
non_present_entry_flush);
 
-   addr >>= PAGE_SHIFT_4K + align;
-   addr <<= PAGE_SHIFT_4K + align;
-
-   return __iommu_flush_iotlb(iommu, did, addr, align,
+   return __iommu_flush_iotlb(iommu, did, addr, mask,
DMA_TLB_PSI_FLUSH, non_present_entry_flush);
 }
 
@@ -1772,78 +1755,103 @@
 }
 
 struct iova *
-iommu_alloc_iova(struct dmar_domain *domain, void *host_addr, size_t size,
-   u64 start, u64 end)
+iommu_alloc_iova(struct dmar_domain *domain, size_t size, u64 end)
 {
-   u64 start_addr;
struct iova *piova;
 
/* Make sure it's in range */
-   if ((start > DOMAIN_MAX_ADDR(domain->gaw)) || end < start)
-   return NULL;
-
end = min_t(u64, DOMAIN_MAX_ADDR(domain->gaw), end);
-   start_addr = PAGE_ALIGN_4K(start);
-   size = aligned_size((u64)host_addr, size);
-   if (!size || (start_addr + size > end))
+   if (!size || (IOVA_START_ADDR + size > end))
return NULL;
 
piova = alloc_iova(>iovad,
-   size >> PAGE_SHIFT_4K, IOVA_PFN(end));
-
+   size >> PAGE_SHIFT_4K, IOVA_PFN(end), 1);
return piova;
 }
 
-static dma_addr_t __intel_map_single(struct device *dev, void *addr,
-   size_t size, int dir, u64 *flush_addr, unsigned int *flush_size)
+static struct iova *
+__intel_alloc_iova(struct device *dev, struct dmar_domain *domain,
+   size_t size)
 {
-   struct dmar_domain *domain;
struct pci_dev *pdev = to_pci_dev(dev);
-   int ret;
-   int prot = 0;
struct iova *iova = NULL;
-   u64 start_addr;
-
-   addr = (void *)virt_to_phys(addr);
-
-   domain = get_domain_for_dev(pdev,
-

[patch -mm][Intel-IOMMU] Optimize sg map/unmap calls

2007-08-01 Thread Keshavamurthy, Anil S

This patch adds PageSelectiveInvalidation support 
replacing existing DomainSelectiveInvalidation for 
intel_{map/unmap}_sg() calls and also enables
to mapping one big contiguous DMA virtual address
which is mapped to discontiguous physical address
for SG map/unmap calls.


Doamin selective invalidations wipes out the IOMMU address
translation cache based on domain ID where as Page selective
invalidations wipes out the IOMMU address translation cache for
that address mask range which is more cache friendly when
compared to Domain selective invalidations.


Here is how it is done.
1) changes to iova.c
alloc_iova() now takes a bool size_aligned argument, which
when when set, returns the io virtual address that is 
naturally aligned to 2 ^ x, where x is the order 
of the size requested.

Returning this io vitual address which is naturally 
aligned helps iommu to do the page selective 
invalidations which is IOMMU cache friendly
over domain selective invalidations.

2) Changes to driver/pci/intel-iommu.c
Clean up intel_{map/unmap}_{single/sg} () calls so that
s/g map/unamp calls is no more dependent on 
intel_{map/unmap}_single()

intel_map_sg() now computes the total DMA virtual address
required and allocates the size aligned total DMA virtual address
and maps the discontiguous physical address to the allocated
contiguous DMA virtual address.

In the intel_unmap_sg() case since the DMA virtual address
is contiguous and size_aligned, PageSelectiveInvalidation
is used replacing earlier DomainSelectiveInvalidations.

Andrew,
Please add this patch to you mm queue.

Signed-off-by: Anil S Keshavamurthy [EMAIL PROTECTED]

---
 drivers/pci/intel-iommu.c |  325 +-
 drivers/pci/iova.c|   63 +++-
 drivers/pci/iova.h|3 
 3 files changed, 231 insertions(+), 160 deletions(-)

Index: work/drivers/pci/intel-iommu.c
===
--- work.orig/drivers/pci/intel-iommu.c 2007-08-01 11:18:12.0 -0700
+++ work/drivers/pci/intel-iommu.c  2007-08-01 11:18:31.0 -0700
@@ -665,24 +665,10 @@
non_present_entry_flush);
 }
 
-static int iommu_get_alignment(u64 base, unsigned int size)
-{
-   int t = 0;
-   u64 end;
-
-   end = base + size - 1;
-   while (base != end) {
-   t++;
-   base = 1;
-   end = 1;
-   }
-   return t;
-}
-
 static int iommu_flush_iotlb_psi(struct intel_iommu *iommu, u16 did,
u64 addr, unsigned int pages, int non_present_entry_flush)
 {
-   unsigned int align;
+   unsigned int mask;
 
BUG_ON(addr  (~PAGE_MASK_4K));
BUG_ON(pages == 0);
@@ -696,16 +682,13 @@
 * PSI requires page size to be 2 ^ x, and the base address is naturally
 * aligned to the size
 */
-   align = iommu_get_alignment(addr  PAGE_SHIFT_4K, pages);
+   mask = ilog2(__roundup_pow_of_two(pages));
/* Fallback to domain selective flush if size is too big */
-   if (align  cap_max_amask_val(iommu-cap))
+   if (mask  cap_max_amask_val(iommu-cap))
return iommu_flush_iotlb_dsi(iommu, did,
non_present_entry_flush);
 
-   addr = PAGE_SHIFT_4K + align;
-   addr = PAGE_SHIFT_4K + align;
-
-   return __iommu_flush_iotlb(iommu, did, addr, align,
+   return __iommu_flush_iotlb(iommu, did, addr, mask,
DMA_TLB_PSI_FLUSH, non_present_entry_flush);
 }
 
@@ -1772,78 +1755,103 @@
 }
 
 struct iova *
-iommu_alloc_iova(struct dmar_domain *domain, void *host_addr, size_t size,
-   u64 start, u64 end)
+iommu_alloc_iova(struct dmar_domain *domain, size_t size, u64 end)
 {
-   u64 start_addr;
struct iova *piova;
 
/* Make sure it's in range */
-   if ((start  DOMAIN_MAX_ADDR(domain-gaw)) || end  start)
-   return NULL;
-
end = min_t(u64, DOMAIN_MAX_ADDR(domain-gaw), end);
-   start_addr = PAGE_ALIGN_4K(start);
-   size = aligned_size((u64)host_addr, size);
-   if (!size || (start_addr + size  end))
+   if (!size || (IOVA_START_ADDR + size  end))
return NULL;
 
piova = alloc_iova(domain-iovad,
-   size  PAGE_SHIFT_4K, IOVA_PFN(end));
-
+   size  PAGE_SHIFT_4K, IOVA_PFN(end), 1);
return piova;
 }
 
-static dma_addr_t __intel_map_single(struct device *dev, void *addr,
-   size_t size, int dir, u64 *flush_addr, unsigned int *flush_size)
+static struct iova *
+__intel_alloc_iova(struct device *dev, struct dmar_domain *domain,
+   size_t size)
 {
-   struct dmar_domain *domain;
struct pci_dev *pdev = to_pci_dev(dev);
-   int ret;
-   int prot = 0;
struct iova *iova = NULL;
-   u64 start_addr;
-
-   addr = (void *)virt_to_phys(addr);
-
-   domain = get_domain_for_dev(pdev,
-

RE: OOPS at dmar_table_init (2.6.22-rc6-mm1 kernel)

2007-07-10 Thread Keshavamurthy, Anil S

(Sorry for the top post) 

This bug is fixed and the fix should appear in the next MM release.
See this link for the patch
http://marc.info/?l=linux-kernel=118313130109808=2 

-Anil
-Original Message-
From: Randy Dunlap [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, July 10, 2007 9:50 AM
To: Pavel Emelianov
Cc: Raj, Ashok; Li, Shaohua; Keshavamurthy, Anil S; Linux Kernel Mailing
List; Andrew Morton
Subject: Re: OOPS at dmar_table_init (2.6.22-rc6-mm1 kernel)

On Tue, 10 Jul 2007 17:57:25 +0400 Pavel Emelianov wrote:

> Hi.
> 
> While working with Andrew's kernel I faced an OOPS on x86_64 
> machine. Unfortunately kernel logs are not appearing in serial 
> console by the time oops happens, but I have some info on the
> screen:
> 
> OOPs is at dmar_table_init() here:
> 80532873 :
> 80532873:   41 55   push   %r13
> 80532875:   41 54   push   %r12
> 80532877:   55  push   %rbp
> 80532878:   53  push   %rbx
> 80532879:   51  push   %rcx
> 8053287a:   4c 8b 2d 07 21 01 00mov
73991(%rip),%r13# 80544988 
> 80532881:   41 0f b6 45 24  movzbl 0x24(%r13),%eax
<<<<<<<<<<< OOPS, %r13 = NULL
> 80532886:   84 c0   test   %al,%al
> 80532888:   75 11   jne
8053289b 
> 8053288a:   48 c7 c7 b0 3d 45 80mov
$0x80453db0,%rdi
> 80532891:   e8 c2 4a cf ff  callq
80227358 
> 80532896:   e9 f2 01 00 00  jmpq
80532a8d 
> 
> Looks like dmar_tbl is NULL. 
> BUG is 100% reproducible. Is there any other info that can be useful?

You could test the patch from
http://lkml.org/lkml/2007/6/29/174 (see "Get diff 1").


---
~Randy
*** Remember to use Documentation/SubmitChecklist when testing your code
***
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

RE: OOPS at dmar_table_init (2.6.22-rc6-mm1 kernel)

2007-07-10 Thread Keshavamurthy, Anil S

(Sorry for the top post) 

This bug is fixed and the fix should appear in the next MM release.
See this link for the patch
http://marc.info/?l=linux-kernelm=118313130109808w=2 

-Anil
-Original Message-
From: Randy Dunlap [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, July 10, 2007 9:50 AM
To: Pavel Emelianov
Cc: Raj, Ashok; Li, Shaohua; Keshavamurthy, Anil S; Linux Kernel Mailing
List; Andrew Morton
Subject: Re: OOPS at dmar_table_init (2.6.22-rc6-mm1 kernel)

On Tue, 10 Jul 2007 17:57:25 +0400 Pavel Emelianov wrote:

 Hi.
 
 While working with Andrew's kernel I faced an OOPS on x86_64 
 machine. Unfortunately kernel logs are not appearing in serial 
 console by the time oops happens, but I have some info on the
 screen:
 
 OOPs is at dmar_table_init() here:
 80532873 dmar_table_init:
 80532873:   41 55   push   %r13
 80532875:   41 54   push   %r12
 80532877:   55  push   %rbp
 80532878:   53  push   %rbx
 80532879:   51  push   %rcx
 8053287a:   4c 8b 2d 07 21 01 00mov
73991(%rip),%r13# 80544988 dmar_tbl
 80532881:   41 0f b6 45 24  movzbl 0x24(%r13),%eax
 OOPS, %r13 = NULL
 80532886:   84 c0   test   %al,%al
 80532888:   75 11   jne
8053289b dmar_table_init+0x28
 8053288a:   48 c7 c7 b0 3d 45 80mov
$0x80453db0,%rdi
 80532891:   e8 c2 4a cf ff  callq
80227358 printk
 80532896:   e9 f2 01 00 00  jmpq
80532a8d dmar_table_init+0x21a
 
 Looks like dmar_tbl is NULL. 
 BUG is 100% reproducible. Is there any other info that can be useful?

You could test the patch from
http://lkml.org/lkml/2007/6/29/174 (see Get diff 1).


---
~Randy
*** Remember to use Documentation/SubmitChecklist when testing your code
***
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [-mm patch] fix duplicate CONFIG_DMAR Makefile line

2007-07-02 Thread Keshavamurthy, Anil S

On Sun, Jul 01, 2007 at 10:22:43PM +0200, Adrian Bunk wrote:
> On Thu, Jun 28, 2007 at 03:43:21AM -0700, Andrew Morton wrote:
> >...
> > Changes since 2.6.22-rc4-mm2:
> >...
> > +intel-iommu-intel-iommu-driver.patch
> >...
> >  Intel IOMMU support
> >...
> 
> 
> Contrary to popular belief, two identical Makefile lines don't bring any 
> advantages.   ;-)
:). Thanks for the fix.

> 
> Signed-off-by: Adrian Bunk <[EMAIL PROTECTED]>
Acked-by: Anil S Keshavamurthy <[EMAIL PROTECTED]>

> 
> ---
> --- linux-2.6.22-rc6-mm1/drivers/pci/Makefile.old 2007-06-30 
> 03:33:54.0 +0200
> +++ linux-2.6.22-rc6-mm1/drivers/pci/Makefile 2007-06-30 03:34:04.0 
> +0200
> @@ -23,9 +23,6 @@
>  # Build Intel IOMMU support
>  obj-$(CONFIG_DMAR) += dmar.o iova.o intel-iommu.o
>  
> -#Build Intel-IOMMU support
> -obj-$(CONFIG_DMAR) += iova.o dmar.o intel-iommu.o
> -
>  #
>  # Some architectures use the generic PCI setup functions
>  #
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [-mm patch] fix duplicate CONFIG_DMAR Makefile line

2007-07-02 Thread Keshavamurthy, Anil S

On Sun, Jul 01, 2007 at 10:22:43PM +0200, Adrian Bunk wrote:
 On Thu, Jun 28, 2007 at 03:43:21AM -0700, Andrew Morton wrote:
 ...
  Changes since 2.6.22-rc4-mm2:
 ...
  +intel-iommu-intel-iommu-driver.patch
 ...
   Intel IOMMU support
 ...
 
 
 Contrary to popular belief, two identical Makefile lines don't bring any 
 advantages.   ;-)
:). Thanks for the fix.

 
 Signed-off-by: Adrian Bunk [EMAIL PROTECTED]
Acked-by: Anil S Keshavamurthy [EMAIL PROTECTED]

 
 ---
 --- linux-2.6.22-rc6-mm1/drivers/pci/Makefile.old 2007-06-30 
 03:33:54.0 +0200
 +++ linux-2.6.22-rc6-mm1/drivers/pci/Makefile 2007-06-30 03:34:04.0 
 +0200
 @@ -23,9 +23,6 @@
  # Build Intel IOMMU support
  obj-$(CONFIG_DMAR) += dmar.o iova.o intel-iommu.o
  
 -#Build Intel-IOMMU support
 -obj-$(CONFIG_DMAR) += iova.o dmar.o intel-iommu.o
 -
  #
  # Some architectures use the generic PCI setup functions
  #
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.22-rc6-mm1 Intel DMAR crash on AMD x86_64

2007-06-29 Thread Keshavamurthy, Anil S

On Fri, Jun 29, 2007 at 12:23:43PM -0400, Muli Ben-Yehuda wrote:
> On Fri, Jun 29, 2007 at 08:28:58AM -0700, Keshavamurthy, Anil S wrote:
> 
> > +++ linux-2.6.22-rc4-mm2/drivers/pci/dmar.c 2007-06-29 07:46:25.0 
> > -0700
> > @@ -260,6 +260,8 @@
> > int ret = 0;
> >  
> > dmar = (struct acpi_table_dmar *)dmar_tbl;
> > +   if (!dmar)
> > +   return -ENODEV;
> >  
> > if (!dmar->width) {
> > printk (KERN_WARNING PREFIX "Zero: Invalid DMAR haw\n");
> > @@ -301,7 +303,7 @@
> >  
> > parse_dmar_table();
> > if (list_empty(_drhd_units)) {
> > -   printk(KERN_ERR PREFIX "No DMAR devices found\n");
> > +   printk(KERN_INFO PREFIX "No DMAR devices found\n");
> > return -ENODEV;
> > }
> > return 0;
> 
> The convention is to print a KERN_DEBUG message if hardware is not
> found when probing it, otherwise the boot messages become cluttered
> with lots of "$FOO not found".

Since this is IOMMU is built into the kernel and it is
good idea to report that the device is not present. The 
above is printed only once and is consistent with other
IOMMU implementation. Atleast it is useful when people 
report bugs we can makeout whether IOMMU is being detected
or not.

Here is what I see on my box.
[..]
"PCI-GART: No AMD northbridge found."
[..]
Calgary: detecting Calgary via BIOS EBDA area
Calgary: Unable to locate Rio Grande table in EBDA - bailing!
[..]

As you can see I don;t have either GART or Calgary on my box.

-Thanks,
Anil
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.22-rc6-mm1 Intel DMAR crash on AMD x86_64

2007-06-29 Thread Keshavamurthy, Anil S

On Thu, Jun 28, 2007 at 06:14:27PM -0700, Li, Shaohua wrote:
> 
> 
> ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.22-
> >rc6/2.6.22-rc6-mm1/
> >>
> >>> +intel-iommu-dmar-detection-and-parsing-logic.patch
[..]
> >
> >I took a picture of it, looks like the backtrace is:
> >
> >NULL pointer dereference at 024
> >EIP:dmar_table_init+0x11
> >intel_iommu_init+0x30
> >pci_iommu_init+0xe
> >kernel_init+0x16e
> >
> >Presumably something is NULL in dmar_table_init that wasn't expected to
> >be.. I would guess it likely crashes on any system without an Intel
> >IOMMU in it.
Yup, that is correct.

> How about something like below?
> 
> 
> int __init dmar_table_init(void)
> {
> + if (!dmar_tbl)
> + return -ENODEV;
>   parse_dmar_table();
why not check for NULL in the function where it touched?
Also when there are no DMAR devices we need the below
printk on the console.

>   if (list_empty(_drhd_units)) {
>   printk(KERN_ERR PREFIX "No DMAR devices found\n");
>   return -ENODEV;
>   }
>   return 0;
> }

Here is the revised patch of the above.
Andrew, please add this fix to
+intel-iommu-dmar-detection-and-parsing-logic.patch


Check for dmar_tbl pointer as this can be NULL on 
systems with no Intel VT-d support.

Signed-off-by: Anil S Keshavamurthy <[EMAIL PROTECTED]>

---
 drivers/pci/dmar.c |4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

Index: linux-2.6.22-rc4-mm2/drivers/pci/dmar.c
===
--- linux-2.6.22-rc4-mm2.orig/drivers/pci/dmar.c2007-06-29 
07:43:43.0 -0700
+++ linux-2.6.22-rc4-mm2/drivers/pci/dmar.c 2007-06-29 07:46:25.0 
-0700
@@ -260,6 +260,8 @@
int ret = 0;
 
dmar = (struct acpi_table_dmar *)dmar_tbl;
+   if (!dmar)
+   return -ENODEV;
 
if (!dmar->width) {
printk (KERN_WARNING PREFIX "Zero: Invalid DMAR haw\n");
@@ -301,7 +303,7 @@
 
parse_dmar_table();
if (list_empty(_drhd_units)) {
-   printk(KERN_ERR PREFIX "No DMAR devices found\n");
+   printk(KERN_INFO PREFIX "No DMAR devices found\n");
return -ENODEV;
}
return 0;
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.22-rc6-mm1 Intel DMAR crash on AMD x86_64

2007-06-29 Thread Keshavamurthy, Anil S

On Thu, Jun 28, 2007 at 06:14:27PM -0700, Li, Shaohua wrote:
 
 
 ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.22-
 rc6/2.6.22-rc6-mm1/
 
  +intel-iommu-dmar-detection-and-parsing-logic.patch
[..]
 
 I took a picture of it, looks like the backtrace is:
 
 NULL pointer dereference at 024
 EIP:dmar_table_init+0x11
 intel_iommu_init+0x30
 pci_iommu_init+0xe
 kernel_init+0x16e
 
 Presumably something is NULL in dmar_table_init that wasn't expected to
 be.. I would guess it likely crashes on any system without an Intel
 IOMMU in it.
Yup, that is correct.

 How about something like below?
 
 
 int __init dmar_table_init(void)
 {
 + if (!dmar_tbl)
 + return -ENODEV;
   parse_dmar_table();
why not check for NULL in the function where it touched?
Also when there are no DMAR devices we need the below
printk on the console.

   if (list_empty(dmar_drhd_units)) {
   printk(KERN_ERR PREFIX No DMAR devices found\n);
   return -ENODEV;
   }
   return 0;
 }

Here is the revised patch of the above.
Andrew, please add this fix to
+intel-iommu-dmar-detection-and-parsing-logic.patch


Check for dmar_tbl pointer as this can be NULL on 
systems with no Intel VT-d support.

Signed-off-by: Anil S Keshavamurthy [EMAIL PROTECTED]

---
 drivers/pci/dmar.c |4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

Index: linux-2.6.22-rc4-mm2/drivers/pci/dmar.c
===
--- linux-2.6.22-rc4-mm2.orig/drivers/pci/dmar.c2007-06-29 
07:43:43.0 -0700
+++ linux-2.6.22-rc4-mm2/drivers/pci/dmar.c 2007-06-29 07:46:25.0 
-0700
@@ -260,6 +260,8 @@
int ret = 0;
 
dmar = (struct acpi_table_dmar *)dmar_tbl;
+   if (!dmar)
+   return -ENODEV;
 
if (!dmar-width) {
printk (KERN_WARNING PREFIX Zero: Invalid DMAR haw\n);
@@ -301,7 +303,7 @@
 
parse_dmar_table();
if (list_empty(dmar_drhd_units)) {
-   printk(KERN_ERR PREFIX No DMAR devices found\n);
+   printk(KERN_INFO PREFIX No DMAR devices found\n);
return -ENODEV;
}
return 0;
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.22-rc6-mm1 Intel DMAR crash on AMD x86_64

2007-06-29 Thread Keshavamurthy, Anil S

On Fri, Jun 29, 2007 at 12:23:43PM -0400, Muli Ben-Yehuda wrote:
 On Fri, Jun 29, 2007 at 08:28:58AM -0700, Keshavamurthy, Anil S wrote:
 
  +++ linux-2.6.22-rc4-mm2/drivers/pci/dmar.c 2007-06-29 07:46:25.0 
  -0700
  @@ -260,6 +260,8 @@
  int ret = 0;
   
  dmar = (struct acpi_table_dmar *)dmar_tbl;
  +   if (!dmar)
  +   return -ENODEV;
   
  if (!dmar-width) {
  printk (KERN_WARNING PREFIX Zero: Invalid DMAR haw\n);
  @@ -301,7 +303,7 @@
   
  parse_dmar_table();
  if (list_empty(dmar_drhd_units)) {
  -   printk(KERN_ERR PREFIX No DMAR devices found\n);
  +   printk(KERN_INFO PREFIX No DMAR devices found\n);
  return -ENODEV;
  }
  return 0;
 
 The convention is to print a KERN_DEBUG message if hardware is not
 found when probing it, otherwise the boot messages become cluttered
 with lots of $FOO not found.

Since this is IOMMU is built into the kernel and it is
good idea to report that the device is not present. The 
above is printed only once and is consistent with other
IOMMU implementation. Atleast it is useful when people 
report bugs we can makeout whether IOMMU is being detected
or not.

Here is what I see on my box.
[..]
PCI-GART: No AMD northbridge found.
[..]
Calgary: detecting Calgary via BIOS EBDA area
Calgary: Unable to locate Rio Grande table in EBDA - bailing!
[..]

As you can see I don;t have either GART or Calgary on my box.

-Thanks,
Anil
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Intel IOMMU 10/10] Iommu floppy workaround

2007-06-26 Thread Keshavamurthy, Anil S

On Tue, Jun 26, 2007 at 12:37:55PM +0200, Andi Kleen wrote:
> 
> > > Index: linux-2.6.22-rc4-mm2/arch/x86_64/Kconfig
> > > ===
> > > --- linux-2.6.22-rc4-mm2.orig/arch/x86_64/Kconfig 2007-06-18 
> > > 15:45:08.0 -0700
> > > +++ linux-2.6.22-rc4-mm2/arch/x86_64/Kconfig  2007-06-18 
> > > 15:45:09.0 -0700
> > > @@ -752,6 +752,16 @@
> > >all the OS visible memory. Hence the driver can continue
> > >to use physical addresses for DMA.
> > >  
> > > +config DMAR_FLPY_WA
> > 
> > FLOPPY is spelled "FLOPPY"!
> 
> Also this shouldn't be a user visible config.  The floppy driver should just
> do this transparently when loaded and undo when unloaded.

Yup, I agree. 

Here goes the patch to make it user invisible config option.

Signed-off-by: Anil S Keshavamurthy <[EMAIL PROTECTED]>
---
 arch/x86_64/Kconfig |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Index: linux-2.6.22-rc4-mm2/arch/x86_64/Kconfig
===
--- linux-2.6.22-rc4-mm2.orig/arch/x86_64/Kconfig   2007-06-26 
12:04:42.0 -0700
+++ linux-2.6.22-rc4-mm2/arch/x86_64/Kconfig2007-06-26 12:06:01.0 
-0700
@@ -753,7 +753,7 @@
 to use physical addresses for DMA.
 
 config DMAR_FLOPPY_WA
-   bool "Support for Floppy disk workaround"
+   bool
depends on DMAR
default y
help

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Intel IOMMU 05/10] Intel IOMMU driver

2007-06-26 Thread Keshavamurthy, Anil S

On Mon, Jun 25, 2007 at 11:25:11PM -0700, Andrew Morton wrote:
> On Tue, 19 Jun 2007 14:37:06 -0700 "Keshavamurthy, Anil S" <[EMAIL 
> PROTECTED]> wrote:
> 
> 
> None of these actually _need_ to be macros and it would be better to 
> implement them in C.  That way things are more self-documenting, more
> pleasant to read, more likely to get commented and you'll fix the
> two bugs wherein the argument to a macro is evaluated more than once.
Agree. Will send a patch when I get back from OLS as any changes to this
needs thorugh testing.

-Anil
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Intel IOMMU 05/10] Intel IOMMU driver

2007-06-26 Thread Keshavamurthy, Anil S

On Mon, Jun 25, 2007 at 11:32:49PM -0700, Andrew Morton wrote:
> On Tue, 19 Jun 2007 16:32:23 -0700 (PDT) Christoph Lameter <[EMAIL 
> PROTECTED]> wrote:
> 
> > On Tue, 19 Jun 2007, Keshavamurthy, Anil S wrote:
> > 
> > > +static inline void *alloc_pgtable_page(void)
> > > +{
> > > + return (void *)get_zeroed_page(GFP_ATOMIC);
> > > +}
> > 
> > Need to pass gfp_t parameter. Repeates a couple of times.
> > ...
> > Is it not possible here to drop the lock and do the alloc with GFP_KERNEL 
> > and deal with the resulting race? That is done in other parts of the 
> > kernel.
> > ...
> > This may be able to become a GFP_KERNEL alloc since interrupts are enabled 
> > at this point?
> > ...
> > GFP_KERNEL alloc possible?
> > 
> 
> Yeah, if there are any callsites at all at which we know that we can
> perform a sleeping allocation, Christoph's suggestions should be adopted. 
> Because even a bare GFP_NOIO is heaps more robust than GFP_ATOMIC, and it
> will also reload the free-pages reserves, making subsequent GFP_ATOMIC
> allocations more likely to succeed.
Yup, will do as part of making this code work for IA64, which is my next
item in my todo list.

-Anil
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Intel IOMMU 10/10] Iommu floppy workaround

2007-06-26 Thread Keshavamurthy, Anil S

On Mon, Jun 25, 2007 at 11:42:22PM -0700, Andrew Morton wrote:
> On Tue, 19 Jun 2007 14:37:11 -0700 "Keshavamurthy, Anil S" <[EMAIL 
> PROTECTED]> wrote:
> 
> Bit weird that this was implemented in the header like that.
Sorry, it is my mistake as I understood thus from your previous
code review comment.
> 
> How about this?  (Also contains rather a lot of obvious style fixes)
Yup, looks good.

-Anil
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Intel IOMMU 04/10] IOVA allocation and management routines

2007-06-26 Thread Keshavamurthy, Anil S

On Mon, Jun 25, 2007 at 11:07:47PM -0700, Andrew Morton wrote:
> On Tue, 19 Jun 2007 14:37:05 -0700 "Keshavamurthy, Anil S" <[EMAIL 
> PROTECTED]> wrote:
> 
> All the inlines in this code are pretty pointless: all those functions have
> a single callsite so the compiler inlines them anyway.  If we later add
> more callsites for these functions, they're too big to be inlined.
> 
> inline is usually wrong: don't do it!
Yup, I agree and will follow in future.

> > +
> > +/**
> > + * find_iova - find's an iova for a given pfn
> > + * @iovad - iova domain in question.
> > + * pfn - page frame number
> > + * This function finds and returns an iova belonging to the
> > + * given doamin which matches the given pfn.
> > + */
> > +struct iova *find_iova(struct iova_domain *iovad, unsigned long pfn)
> > +{
> > +   unsigned long flags;
> > +   struct rb_node *node;
> > +
> > +   spin_lock_irqsave(>iova_rbtree_lock, flags);
> > +   node = iovad->rbroot.rb_node;
> > +   while (node) {
> > +   struct iova *iova = container_of(node, struct iova, node);
> > +
> > +   /* If pfn falls within iova's range, return iova */
> > +   if ((pfn >= iova->pfn_lo) && (pfn <= iova->pfn_hi)) {
> > +   spin_unlock_irqrestore(>iova_rbtree_lock, flags);
> > +   return iova;
> > +   }
> > +
> > +   if (pfn < iova->pfn_lo)
> > +   node = node->rb_left;
> > +   else if (pfn > iova->pfn_lo)
> > +   node = node->rb_right;
> > +   }
> > +
> > +   spin_unlock_irqrestore(>iova_rbtree_lock, flags);
> > +   return NULL;
> > +}
> 
> So we take the lock, look up an item, then drop the lock then return the
> item we just found.  We took no refcount on it and we didn't do anything to
> keep this object alive.
> 
> Is that a bug, or does the (afacit undocumented) lifecycle management of
> these things take care of it in some manner?  If yes, please reply via an
> add-a-comment patch.

Nope, this is not a bug. Adding a comment patch which explains the same.

> 
> 
> > +/**
> > + * __free_iova - frees the given iova
> > + * @iovad: iova domain in question.
> > + * @iova: iova in question.
> > + * Frees the given iova belonging to the giving domain
> > + */
> > +void
> > +__free_iova(struct iova_domain *iovad, struct iova *iova)
> > +{
> > +   unsigned long flags;
> > +
> > +   if (iova) {
> > +   spin_lock_irqsave(>iova_rbtree_lock, flags);
> > +   __cached_rbnode_delete_update(iovad, iova);
> > +   rb_erase(>node, >rbroot);
> > +   spin_unlock_irqrestore(>iova_rbtree_lock, flags);
> > +   free_iova_mem(iova);
> > +   }
> > +}
> 

> Can this really be called with NULL?  If so, under what circumstances? 
> (This reader couldn't work it out from a brief look at the code, so perhaps
> others will not be able to either.  Perhaps a comment is needed)

It was getting called from only one place free_iova().
Below patch address your concern.

> 
> > +/**
> > + * free_iova - finds and frees the iova for a given pfn
> > + * @iovad: - iova domain in question.
> > + * @pfn: - pfn that is allocated previously
> > + * This functions finds an iova for a given pfn and then
> > + * frees the iova from that domain.
> > + */
> > +void
> > +free_iova(struct iova_domain *iovad, unsigned long pfn)
> > +{
> > +   struct iova *iova = find_iova(iovad, pfn);
> > +   __free_iova(iovad, iova);
> > +
> > +}
> > +
> > +/**
> > + * put_iova_domain - destroys the iova doamin
> > + * @iovad: - iova domain in question.
> > + * All the iova's in that domain are destroyed.
> > + */
> > +void put_iova_domain(struct iova_domain *iovad)
> > +{
> > +   struct rb_node *node;
> > +   unsigned long flags;
> > +
> > +   spin_lock_irqsave(>iova_rbtree_lock, flags);
> > +   node = rb_first(>rbroot);
> > +   while (node) {
> > +   struct iova *iova = container_of(node, struct iova, node);
> > +   rb_erase(node, >rbroot);
> > +   free_iova_mem(iova);
> > +   node = rb_first(>rbroot);
> > +   }
> > +   spin_unlock_irqrestore(>iova_rbtree_lock, flags);
> > +}
> 
> Right, so I suspect what's happening here is that all iova's remain valid
> until their entire domain is destroyed, yes?

Nope. IOVA are valid only for the duration of DMA

Re: [Intel IOMMU 00/10] Intel IOMMU support, take #2

2007-06-26 Thread Keshavamurthy, Anil S

On Tue, Jun 26, 2007 at 11:11:25AM -0400, Muli Ben-Yehuda wrote:
> On Tue, Jun 26, 2007 at 08:03:59AM -0700, Arjan van de Ven wrote:
> > Muli Ben-Yehuda wrote:
> > >How much? we have numbers (to be presented at OLS later this week)
> > >that show that on bare-metal an IOMMU can cost as much as 15%-30% more
> > >CPU utilization for an IO intensive workload (netperf). It will be
> > >interesting to see comparable numbers for VT-d.
> > 
> > for VT-d it is a LOT less. I'll let anil give you his data :)
> 
> Looking forward to it. Note that this is on a large SMP machine with
> Gigabit ethernet, with netperf TCP stream. Comparing numbers for other
> benchmarks on other machines is ... less than useful, but the numbers
> themeselves are interesting.
Our initial benchmark results showed we had around 3% extra CPU 
utilization overhead when compared to native(i.e without IOMMU).
Again, our benchmark was on small SMP machine and we used
iperf and a 1G ethernet cards.

Going forward we will do more benchmark tests and will share the
results.

-Anil
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Intel IOMMU 02/10] PCI generic helper function

2007-06-26 Thread Keshavamurthy, Anil S

On Mon, Jun 25, 2007 at 10:49:37PM -0700, Andrew Morton wrote:
> On Tue, 19 Jun 2007 14:37:03 -0700 "Keshavamurthy, Anil S" <[EMAIL 
> PROTECTED]> wrote:
> 
> > +struct pci_dev *
> > +pci_find_upstream_pcie_bridge(struct pci_dev *pdev)
> 
> You didn't need a newline there, but that's what the rest of that file
> does.  Hu hum.
> 
> > +{
> > +   struct pci_dev *tmp = NULL;
> > +
> > +   if (pdev->is_pcie)
> > +   return NULL;
> > +   while (1) {
> > +   if (!pdev->bus->self)
> > +   break;
> > +   pdev = pdev->bus->self;
> > +   /* a p2p bridge */
> > +   if (!pdev->is_pcie) {
> > +   tmp = pdev;
> > +   continue;
> > +   }
> > +   /* PCI device should connect to a PCIE bridge */
> > +   BUG_ON(pdev->pcie_type != PCI_EXP_TYPE_PCI_BRIDGE);
> 
> I assume that if this bug triggers, we've found some broken hardware?
> 
> Going BUG seems like a pretty rude reaction to this, especially when it
> would be so easy to drop a warning and then recover.
> 
> 
> How's about this?
Looks good, thanks.

> 
> --- a/drivers/pci/search.c~intel-iommu-pci-generic-helper-function-fix
> +++ a/drivers/pci/search.c
> @@ -38,7 +38,11 @@ pci_find_upstream_pcie_bridge(struct pci
>   continue;
>   }
>   /* PCI device should connect to a PCIE bridge */
> - BUG_ON(pdev->pcie_type != PCI_EXP_TYPE_PCI_BRIDGE);
> + if (pdev->pcie_type != PCI_EXP_TYPE_PCI_BRIDGE) {
> + /* Busted hardware? */
> + WARN_ON_ONCE(1);
> + return NULL;
> + }
>   return pdev;
>   }
>  
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Intel IOMMU 02/10] PCI generic helper function

2007-06-26 Thread Keshavamurthy, Anil S

On Mon, Jun 25, 2007 at 10:49:37PM -0700, Andrew Morton wrote:
 On Tue, 19 Jun 2007 14:37:03 -0700 Keshavamurthy, Anil S [EMAIL 
 PROTECTED] wrote:
 
  +struct pci_dev *
  +pci_find_upstream_pcie_bridge(struct pci_dev *pdev)
 
 You didn't need a newline there, but that's what the rest of that file
 does.  Hu hum.
 
  +{
  +   struct pci_dev *tmp = NULL;
  +
  +   if (pdev-is_pcie)
  +   return NULL;
  +   while (1) {
  +   if (!pdev-bus-self)
  +   break;
  +   pdev = pdev-bus-self;
  +   /* a p2p bridge */
  +   if (!pdev-is_pcie) {
  +   tmp = pdev;
  +   continue;
  +   }
  +   /* PCI device should connect to a PCIE bridge */
  +   BUG_ON(pdev-pcie_type != PCI_EXP_TYPE_PCI_BRIDGE);
 
 I assume that if this bug triggers, we've found some broken hardware?
 
 Going BUG seems like a pretty rude reaction to this, especially when it
 would be so easy to drop a warning and then recover.
 
 
 How's about this?
Looks good, thanks.

 
 --- a/drivers/pci/search.c~intel-iommu-pci-generic-helper-function-fix
 +++ a/drivers/pci/search.c
 @@ -38,7 +38,11 @@ pci_find_upstream_pcie_bridge(struct pci
   continue;
   }
   /* PCI device should connect to a PCIE bridge */
 - BUG_ON(pdev-pcie_type != PCI_EXP_TYPE_PCI_BRIDGE);
 + if (pdev-pcie_type != PCI_EXP_TYPE_PCI_BRIDGE) {
 + /* Busted hardware? */
 + WARN_ON_ONCE(1);
 + return NULL;
 + }
   return pdev;
   }
  
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Intel IOMMU 00/10] Intel IOMMU support, take #2

2007-06-26 Thread Keshavamurthy, Anil S

On Tue, Jun 26, 2007 at 11:11:25AM -0400, Muli Ben-Yehuda wrote:
 On Tue, Jun 26, 2007 at 08:03:59AM -0700, Arjan van de Ven wrote:
  Muli Ben-Yehuda wrote:
  How much? we have numbers (to be presented at OLS later this week)
  that show that on bare-metal an IOMMU can cost as much as 15%-30% more
  CPU utilization for an IO intensive workload (netperf). It will be
  interesting to see comparable numbers for VT-d.
  
  for VT-d it is a LOT less. I'll let anil give you his data :)
 
 Looking forward to it. Note that this is on a large SMP machine with
 Gigabit ethernet, with netperf TCP stream. Comparing numbers for other
 benchmarks on other machines is ... less than useful, but the numbers
 themeselves are interesting.
Our initial benchmark results showed we had around 3% extra CPU 
utilization overhead when compared to native(i.e without IOMMU).
Again, our benchmark was on small SMP machine and we used
iperf and a 1G ethernet cards.

Going forward we will do more benchmark tests and will share the
results.

-Anil
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Intel IOMMU 04/10] IOVA allocation and management routines

2007-06-26 Thread Keshavamurthy, Anil S

On Mon, Jun 25, 2007 at 11:07:47PM -0700, Andrew Morton wrote:
 On Tue, 19 Jun 2007 14:37:05 -0700 Keshavamurthy, Anil S [EMAIL 
 PROTECTED] wrote:
 
 All the inlines in this code are pretty pointless: all those functions have
 a single callsite so the compiler inlines them anyway.  If we later add
 more callsites for these functions, they're too big to be inlined.
 
 inline is usually wrong: don't do it!
Yup, I agree and will follow in future.

  +
  +/**
  + * find_iova - find's an iova for a given pfn
  + * @iovad - iova domain in question.
  + * pfn - page frame number
  + * This function finds and returns an iova belonging to the
  + * given doamin which matches the given pfn.
  + */
  +struct iova *find_iova(struct iova_domain *iovad, unsigned long pfn)
  +{
  +   unsigned long flags;
  +   struct rb_node *node;
  +
  +   spin_lock_irqsave(iovad-iova_rbtree_lock, flags);
  +   node = iovad-rbroot.rb_node;
  +   while (node) {
  +   struct iova *iova = container_of(node, struct iova, node);
  +
  +   /* If pfn falls within iova's range, return iova */
  +   if ((pfn = iova-pfn_lo)  (pfn = iova-pfn_hi)) {
  +   spin_unlock_irqrestore(iovad-iova_rbtree_lock, flags);
  +   return iova;
  +   }
  +
  +   if (pfn  iova-pfn_lo)
  +   node = node-rb_left;
  +   else if (pfn  iova-pfn_lo)
  +   node = node-rb_right;
  +   }
  +
  +   spin_unlock_irqrestore(iovad-iova_rbtree_lock, flags);
  +   return NULL;
  +}
 
 So we take the lock, look up an item, then drop the lock then return the
 item we just found.  We took no refcount on it and we didn't do anything to
 keep this object alive.
 
 Is that a bug, or does the (afacit undocumented) lifecycle management of
 these things take care of it in some manner?  If yes, please reply via an
 add-a-comment patch.

Nope, this is not a bug. Adding a comment patch which explains the same.

 
 
  +/**
  + * __free_iova - frees the given iova
  + * @iovad: iova domain in question.
  + * @iova: iova in question.
  + * Frees the given iova belonging to the giving domain
  + */
  +void
  +__free_iova(struct iova_domain *iovad, struct iova *iova)
  +{
  +   unsigned long flags;
  +
  +   if (iova) {
  +   spin_lock_irqsave(iovad-iova_rbtree_lock, flags);
  +   __cached_rbnode_delete_update(iovad, iova);
  +   rb_erase(iova-node, iovad-rbroot);
  +   spin_unlock_irqrestore(iovad-iova_rbtree_lock, flags);
  +   free_iova_mem(iova);
  +   }
  +}
 

 Can this really be called with NULL?  If so, under what circumstances? 
 (This reader couldn't work it out from a brief look at the code, so perhaps
 others will not be able to either.  Perhaps a comment is needed)

It was getting called from only one place free_iova().
Below patch address your concern.

 
  +/**
  + * free_iova - finds and frees the iova for a given pfn
  + * @iovad: - iova domain in question.
  + * @pfn: - pfn that is allocated previously
  + * This functions finds an iova for a given pfn and then
  + * frees the iova from that domain.
  + */
  +void
  +free_iova(struct iova_domain *iovad, unsigned long pfn)
  +{
  +   struct iova *iova = find_iova(iovad, pfn);
  +   __free_iova(iovad, iova);
  +
  +}
  +
  +/**
  + * put_iova_domain - destroys the iova doamin
  + * @iovad: - iova domain in question.
  + * All the iova's in that domain are destroyed.
  + */
  +void put_iova_domain(struct iova_domain *iovad)
  +{
  +   struct rb_node *node;
  +   unsigned long flags;
  +
  +   spin_lock_irqsave(iovad-iova_rbtree_lock, flags);
  +   node = rb_first(iovad-rbroot);
  +   while (node) {
  +   struct iova *iova = container_of(node, struct iova, node);
  +   rb_erase(node, iovad-rbroot);
  +   free_iova_mem(iova);
  +   node = rb_first(iovad-rbroot);
  +   }
  +   spin_unlock_irqrestore(iovad-iova_rbtree_lock, flags);
  +}
 
 Right, so I suspect what's happening here is that all iova's remain valid
 until their entire domain is destroyed, yes?

Nope. IOVA are valid only for the duration of DMA MAP and DMA UNMAP calls.
In case of Intel-iommu driver, the iova's are valid only for the duration
of __intel_map_singl() and __intel_unmap_single() calls.

 
 What is the upper bound to the memory consumpotion here, and what provides
 it?
As explained above, iova are freed and reused again during the DMA map calls.

 
 Again, some code comments about these design issues are appropriate.
 
  +/*
  + * We need a fixed PAGE_SIZE of 4K irrespective of
  + * arch PAGE_SIZE for IOMMU page tables.
  + */
  +#define PAGE_SHIFT_4K  (12)
  +#define PAGE_SIZE_4K   (1UL  PAGE_SHIFT_4K)
  +#define PAGE_MASK_4K   (((u64)-1)  PAGE_SHIFT_4K)
  +#define PAGE_ALIGN_4K(addr)(((addr) + PAGE_SIZE_4K - 1)  
  PAGE_MASK_4K)
 
 Am still wondering why we cannot use PAGE_SIZE, PAGE_SHIFT, etc here.
VT-d hardware(a.k.a Intel IOMMU

Re: [Intel IOMMU 10/10] Iommu floppy workaround

2007-06-26 Thread Keshavamurthy, Anil S

On Mon, Jun 25, 2007 at 11:42:22PM -0700, Andrew Morton wrote:
 On Tue, 19 Jun 2007 14:37:11 -0700 Keshavamurthy, Anil S [EMAIL 
 PROTECTED] wrote:
 
 Bit weird that this was implemented in the header like that.
Sorry, it is my mistake as I understood thus from your previous
code review comment.
 
 How about this?  (Also contains rather a lot of obvious style fixes)
Yup, looks good.

-Anil
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Intel IOMMU 05/10] Intel IOMMU driver

2007-06-26 Thread Keshavamurthy, Anil S

On Mon, Jun 25, 2007 at 11:32:49PM -0700, Andrew Morton wrote:
 On Tue, 19 Jun 2007 16:32:23 -0700 (PDT) Christoph Lameter [EMAIL 
 PROTECTED] wrote:
 
  On Tue, 19 Jun 2007, Keshavamurthy, Anil S wrote:
  
   +static inline void *alloc_pgtable_page(void)
   +{
   + return (void *)get_zeroed_page(GFP_ATOMIC);
   +}
  
  Need to pass gfp_t parameter. Repeates a couple of times.
  ...
  Is it not possible here to drop the lock and do the alloc with GFP_KERNEL 
  and deal with the resulting race? That is done in other parts of the 
  kernel.
  ...
  This may be able to become a GFP_KERNEL alloc since interrupts are enabled 
  at this point?
  ...
  GFP_KERNEL alloc possible?
  
 
 Yeah, if there are any callsites at all at which we know that we can
 perform a sleeping allocation, Christoph's suggestions should be adopted. 
 Because even a bare GFP_NOIO is heaps more robust than GFP_ATOMIC, and it
 will also reload the free-pages reserves, making subsequent GFP_ATOMIC
 allocations more likely to succeed.
Yup, will do as part of making this code work for IA64, which is my next
item in my todo list.

-Anil
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Intel IOMMU 05/10] Intel IOMMU driver

2007-06-26 Thread Keshavamurthy, Anil S

On Mon, Jun 25, 2007 at 11:25:11PM -0700, Andrew Morton wrote:
 On Tue, 19 Jun 2007 14:37:06 -0700 Keshavamurthy, Anil S [EMAIL 
 PROTECTED] wrote:
 
 
 None of these actually _need_ to be macros and it would be better to 
 implement them in C.  That way things are more self-documenting, more
 pleasant to read, more likely to get commented and you'll fix the
 two bugs wherein the argument to a macro is evaluated more than once.
Agree. Will send a patch when I get back from OLS as any changes to this
needs thorugh testing.

-Anil
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Intel IOMMU 10/10] Iommu floppy workaround

2007-06-26 Thread Keshavamurthy, Anil S

On Tue, Jun 26, 2007 at 12:37:55PM +0200, Andi Kleen wrote:
 
   Index: linux-2.6.22-rc4-mm2/arch/x86_64/Kconfig
   ===
   --- linux-2.6.22-rc4-mm2.orig/arch/x86_64/Kconfig 2007-06-18 
   15:45:08.0 -0700
   +++ linux-2.6.22-rc4-mm2/arch/x86_64/Kconfig  2007-06-18 
   15:45:09.0 -0700
   @@ -752,6 +752,16 @@
  all the OS visible memory. Hence the driver can continue
  to use physical addresses for DMA.

   +config DMAR_FLPY_WA
  
  FLOPPY is spelled FLOPPY!
 
 Also this shouldn't be a user visible config.  The floppy driver should just
 do this transparently when loaded and undo when unloaded.

Yup, I agree. 

Here goes the patch to make it user invisible config option.

Signed-off-by: Anil S Keshavamurthy [EMAIL PROTECTED]
---
 arch/x86_64/Kconfig |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Index: linux-2.6.22-rc4-mm2/arch/x86_64/Kconfig
===
--- linux-2.6.22-rc4-mm2.orig/arch/x86_64/Kconfig   2007-06-26 
12:04:42.0 -0700
+++ linux-2.6.22-rc4-mm2/arch/x86_64/Kconfig2007-06-26 12:06:01.0 
-0700
@@ -753,7 +753,7 @@
 to use physical addresses for DMA.
 
 config DMAR_FLOPPY_WA
-   bool Support for Floppy disk workaround
+   bool
depends on DMAR
default y
help

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Intel IOMMU 06/10] Avoid memory allocation failures in dma map api calls

2007-06-21 Thread Keshavamurthy, Anil S

On Thu, Jun 21, 2007 at 09:13:11AM +0200, Peter Zijlstra wrote:
> On Wed, 2007-06-20 at 23:37 -0700, Keshavamurthy, Anil S wrote:
> > On Thu, Jun 21, 2007 at 08:29:34AM +0200, Peter Zijlstra wrote:
> > > On Wed, 2007-06-20 at 23:11 -0700, Arjan van de Ven wrote:
> > > > Peter Zijlstra wrote:
> > > Also, the other iommu code you pointed me to, was happy to fail, it did
> > > not attempt to use the emergency reserves.
> > 
> > Is the same behavior acceptable here?
> 
> I would say it is. Failure is a part of life.
> 
> If you have a (small) mempool with 16 pages or so, that should give you
> plenty megabytes of io-space to get out of a tight spot. That is, you
> can queue many pages with that. If it is depleted you know you have at
> least that many pages outstanding. So failing will just delay the next
> pages.
> 
> Throughput is not a key issue when that low on memory, a guarantee of
> progress is.

Andrew,
Can you please queue all the other patches except this one for your
next MM release? (Yes, We can safely drop this patch without any issues in 
applying
rest of the patches).


-thanks,
Anil Keshavamurthy
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Intel IOMMU 06/10] Avoid memory allocation failures in dma map api calls

2007-06-21 Thread Keshavamurthy, Anil S

On Thu, Jun 21, 2007 at 08:29:34AM +0200, Peter Zijlstra wrote:
> On Wed, 2007-06-20 at 23:11 -0700, Arjan van de Ven wrote:
> > Peter Zijlstra wrote:
> Also, the other iommu code you pointed me to, was happy to fail, it did
> not attempt to use the emergency reserves.

Is the same behavior acceptable here?

-Anil
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Intel IOMMU 06/10] Avoid memory allocation failures in dma map api calls

2007-06-21 Thread Keshavamurthy, Anil S

On Wed, Jun 20, 2007 at 11:11:05PM -0700, Arjan van de Ven wrote:
> Peter Zijlstra wrote:
> >What I'm saying is that if you do use the reserves, you should ensure
> >the use is bounded. I'm not seeing anything like that.
> 
> each mapping takes at most 3 pages
With 3 pages(3 level page table), IOMMU can map at 
most 2MB and each additional last level page helps
map another 2MB. Again, the IOMMU driver re-uses the
virtual address instead of giving contiguous virtual address
which helps to keep the page table growth and helps
reuse the same page table entries, in that sense we are
bounded, again we are not sure how much IO will be in flight
on a perticular system so as to reserve that many pages for
IOMMU page table setup.

-Anil
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Intel IOMMU 06/10] Avoid memory allocation failures in dma map api calls

2007-06-21 Thread Keshavamurthy, Anil S

On Wed, Jun 20, 2007 at 11:11:05PM -0700, Arjan van de Ven wrote:
 Peter Zijlstra wrote:
 What I'm saying is that if you do use the reserves, you should ensure
 the use is bounded. I'm not seeing anything like that.
 
 each mapping takes at most 3 pages
With 3 pages(3 level page table), IOMMU can map at 
most 2MB and each additional last level page helps
map another 2MB. Again, the IOMMU driver re-uses the
virtual address instead of giving contiguous virtual address
which helps to keep the page table growth and helps
reuse the same page table entries, in that sense we are
bounded, again we are not sure how much IO will be in flight
on a perticular system so as to reserve that many pages for
IOMMU page table setup.

-Anil
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Intel IOMMU 06/10] Avoid memory allocation failures in dma map api calls

2007-06-21 Thread Keshavamurthy, Anil S

On Thu, Jun 21, 2007 at 08:29:34AM +0200, Peter Zijlstra wrote:
 On Wed, 2007-06-20 at 23:11 -0700, Arjan van de Ven wrote:
  Peter Zijlstra wrote:
 Also, the other iommu code you pointed me to, was happy to fail, it did
 not attempt to use the emergency reserves.

Is the same behavior acceptable here?

-Anil
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Intel IOMMU 06/10] Avoid memory allocation failures in dma map api calls

2007-06-21 Thread Keshavamurthy, Anil S

On Thu, Jun 21, 2007 at 09:13:11AM +0200, Peter Zijlstra wrote:
 On Wed, 2007-06-20 at 23:37 -0700, Keshavamurthy, Anil S wrote:
  On Thu, Jun 21, 2007 at 08:29:34AM +0200, Peter Zijlstra wrote:
   On Wed, 2007-06-20 at 23:11 -0700, Arjan van de Ven wrote:
Peter Zijlstra wrote:
   Also, the other iommu code you pointed me to, was happy to fail, it did
   not attempt to use the emergency reserves.
  
  Is the same behavior acceptable here?
 
 I would say it is. Failure is a part of life.
 
 If you have a (small) mempool with 16 pages or so, that should give you
 plenty megabytes of io-space to get out of a tight spot. That is, you
 can queue many pages with that. If it is depleted you know you have at
 least that many pages outstanding. So failing will just delay the next
 pages.
 
 Throughput is not a key issue when that low on memory, a guarantee of
 progress is.

Andrew,
Can you please queue all the other patches except this one for your
next MM release? (Yes, We can safely drop this patch without any issues in 
applying
rest of the patches).


-thanks,
Anil Keshavamurthy
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Intel IOMMU 06/10] Avoid memory allocation failures in dma map api calls

2007-06-20 Thread Keshavamurthy, Anil S

On Wed, Jun 20, 2007 at 10:08:51PM +0200, Peter Zijlstra wrote:
> On Wed, 2007-06-20 at 12:14 -0700, Arjan van de Ven wrote:
> > Peter Zijlstra wrote:
> > > So a reclaim context (kswapd and direct reclaim) set PF_MEMALLOC to
> > > ensure they themselves will not block on a memory allocation. And it is
> > > understood that these code paths have a bounded memory footprint.
> > 
> > 
> > that's a too simplistic view though; what happens is that kswapd will 
> > queue the IO, but the irq context will then take the IO from the queue 
> > and do the DMA mapping... which needs the memory.

As Arjan is saying, that a reclaim context sets PF_MEMALLOC flag
and submits the IO, but the controller driver decides to queue the IO,
and later in the interrupt context it de-queues and calls the
IOMMU driver for mapping the DMA physical address and in this DMA 
map api call we may need the memory to satisfy the DMA map api call.
Hence PF_MEMALLOC set by the reclaim context should work from 
interrupt context too, if it is not then that needs to be fixed.

> 
> Right, but who stops some unrelated interrupt handler from completely
> depleting memory?
The DMA map API's exposed by IOMMU are called by the 
storage/network controller driver and it is in this IOMMU
driver we are talking about allocating memory. So we have no 
clue how much I/O the controller above us is capable of submitting. 
And all we do is the mapping of DMA phyical address and 
provide the caller with the virtual DMA address and it
is in this process we may need some memory.

> 
> What I'm saying is that there should be some coupling between the
> reclaim context and the irq context doing work on its behalf.
Nope this info is not available when upper level drivers
calls the standard DMA map api's.

> 
> For instance, you know how many pages are in the queue, and which queue.
> So you could preallocate enough memory to handle that many pages from
> irq context and couple that reserve to the queue object. Then irq
> context can use that memory to do the work.

As I have said earlier, we are not the storage or network controller
driver and we have no idea how much the IO the controller is capable of.

Again this patch is the best effort for not to fail the DMA map api,
if it fails due to lack of memory, we have no choice but to return
failure to upper level driver. But today, upper level drivers are
not capable of handling failures from DMA map api's hence this best
effort for not to fail the DMA map api call.

thanks,
-Anil

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Intel IOMMU 06/10] Avoid memory allocation failures in dma map api calls

2007-06-20 Thread Keshavamurthy, Anil S

On Wed, Jun 20, 2007 at 10:08:51PM +0200, Peter Zijlstra wrote:
 On Wed, 2007-06-20 at 12:14 -0700, Arjan van de Ven wrote:
  Peter Zijlstra wrote:
   So a reclaim context (kswapd and direct reclaim) set PF_MEMALLOC to
   ensure they themselves will not block on a memory allocation. And it is
   understood that these code paths have a bounded memory footprint.
  
  
  that's a too simplistic view though; what happens is that kswapd will 
  queue the IO, but the irq context will then take the IO from the queue 
  and do the DMA mapping... which needs the memory.

As Arjan is saying, that a reclaim context sets PF_MEMALLOC flag
and submits the IO, but the controller driver decides to queue the IO,
and later in the interrupt context it de-queues and calls the
IOMMU driver for mapping the DMA physical address and in this DMA 
map api call we may need the memory to satisfy the DMA map api call.
Hence PF_MEMALLOC set by the reclaim context should work from 
interrupt context too, if it is not then that needs to be fixed.

 
 Right, but who stops some unrelated interrupt handler from completely
 depleting memory?
The DMA map API's exposed by IOMMU are called by the 
storage/network controller driver and it is in this IOMMU
driver we are talking about allocating memory. So we have no 
clue how much I/O the controller above us is capable of submitting. 
And all we do is the mapping of DMA phyical address and 
provide the caller with the virtual DMA address and it
is in this process we may need some memory.

 
 What I'm saying is that there should be some coupling between the
 reclaim context and the irq context doing work on its behalf.
Nope this info is not available when upper level drivers
calls the standard DMA map api's.

 
 For instance, you know how many pages are in the queue, and which queue.
 So you could preallocate enough memory to handle that many pages from
 irq context and couple that reserve to the queue object. Then irq
 context can use that memory to do the work.

As I have said earlier, we are not the storage or network controller
driver and we have no idea how much the IO the controller is capable of.


Again this patch is the best effort for not to fail the DMA map api,
if it fails due to lack of memory, we have no choice but to return
failure to upper level driver. But today, upper level drivers are
not capable of handling failures from DMA map api's hence this best
effort for not to fail the DMA map api call.

thanks,
-Anil


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Intel IOMMU 05/10] Intel IOMMU driver

2007-06-19 Thread Keshavamurthy, Anil S

On Tue, Jun 19, 2007 at 04:32:23PM -0700, Christoph Lameter wrote:
> On Tue, 19 Jun 2007, Keshavamurthy, Anil S wrote:
> 
> > +static inline void *alloc_pgtable_page(void)
> > +{
> > +   return (void *)get_zeroed_page(GFP_ATOMIC);
> > +}
> 
> Need to pass gfp_t parameter. Repeates a couple of times.
> 
> > +   addr &= (((u64)1) << addr_width) - 1;
> > +   parent = domain->pgd;
> > +
> > +   spin_lock_irqsave(>mapping_lock, flags);
> > +   while (level > 0) {
> > +   void *tmp_page;
> > +
> > +   offset = address_level_offset(addr, level);
> > +   pte = [offset];
> > +   if (level == 1)
> > +   break;
> > +
> > +   if (!dma_pte_present(*pte)) {
> > +   tmp_page = alloc_pgtable_page();
> 
> Is it not possible here to drop the lock and do the alloc with GFP_KERNEL 
> and deal with the resulting race? That is done in other parts of the 
> kernel.
> 
> > +/* iommu handling */
> > +static int iommu_alloc_root_entry(struct intel_iommu *iommu)
> > +{
> > +   struct root_entry *root;
> > +   unsigned long flags;
> > +
> > +   root = (struct root_entry *)alloc_pgtable_page();
> 
> This may be able to become a GFP_KERNEL alloc since interrupts are enabled 
> at this point?

Memory allocated during driver init is very less and not much benefit
with the suggested changes I think. Please correct me If I am wrong.

The biggest benifit will be when we can figure out GPF_ flags
during runtime when DMA map api's are called. 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[Intel IOMMU 10/10] Iommu floppy workaround

2007-06-19 Thread Keshavamurthy, Anil S

This config option (DMAR_FLPY_WA) sets up 1:1 mapping for the
floppy device so that the floppy device which does not use
DMA api's will continue to work. 

Once the floppy driver starts using DMA api's this config option
can be turn off or this patch can be yanked out of kernel at that
time.


Signed-off-by: Anil S Keshavamurthy <[EMAIL PROTECTED]>
---
 arch/x86_64/Kconfig   |   10 ++
 drivers/pci/intel-iommu.c |   22 ++
 drivers/pci/intel-iommu.h |7 +++
 3 files changed, 39 insertions(+)

Index: linux-2.6.22-rc4-mm2/arch/x86_64/Kconfig
===
--- linux-2.6.22-rc4-mm2.orig/arch/x86_64/Kconfig   2007-06-18 
15:45:08.0 -0700
+++ linux-2.6.22-rc4-mm2/arch/x86_64/Kconfig2007-06-18 15:45:09.0 
-0700
@@ -752,6 +752,16 @@
 all the OS visible memory. Hence the driver can continue
 to use physical addresses for DMA.
 
+config DMAR_FLPY_WA
+   bool "Support for Floppy disk workaround"
+   depends on DMAR
+   default y
+   help
+Floppy disk drivers are know to by pass dma api calls
+their by failing to work when IOMMU is enabled. This
+work around will setup a 1 to 1 mappings for the first
+16M to make floppy(isa device) work.
+
 source "drivers/pci/pcie/Kconfig"
 
 source "drivers/pci/Kconfig"
Index: linux-2.6.22-rc4-mm2/drivers/pci/intel-iommu.c
===
--- linux-2.6.22-rc4-mm2.orig/drivers/pci/intel-iommu.c 2007-06-18 
15:45:08.0 -0700
+++ linux-2.6.22-rc4-mm2/drivers/pci/intel-iommu.c  2007-06-18 
15:45:09.0 -0700
@@ -1631,6 +1631,26 @@
 }
 #endif
 
+#ifdef CONFIG_DMAR_FLPY_WA
+static inline void iommu_prepare_isa(void)
+{
+   struct pci_dev *pdev = NULL;
+   int ret;
+
+   pdev = pci_get_class (PCI_CLASS_BRIDGE_ISA << 8, NULL);
+   if (!pdev)
+   return;
+
+   printk (KERN_INFO "IOMMU: Prepare 0-16M unity mapping for LPC\n");
+   ret = iommu_prepare_identity_map(pdev, 0, 16*1024*1024);
+
+   if (ret)
+   printk ("IOMMU: Failed to create 0-64M identity map, \
+   Floppy might not work\n");
+
+}
+#endif
+
 int __init init_dmars(void)
 {
struct dmar_drhd_unit *drhd;
@@ -1696,6 +1716,8 @@
 
iommu_prepare_gfx_mapping();
 
+   iommu_prepare_isa();
+
/*
 * for each drhd
 *   enable fault log
Index: linux-2.6.22-rc4-mm2/drivers/pci/intel-iommu.h
===
--- linux-2.6.22-rc4-mm2.orig/drivers/pci/intel-iommu.h 2007-06-18 
15:45:08.0 -0700
+++ linux-2.6.22-rc4-mm2/drivers/pci/intel-iommu.h  2007-06-18 
15:45:09.0 -0700
@@ -322,4 +322,11 @@
 }
 #endif /* !CONFIG_DMAR_GFX_WA */
 
+#ifndef CONFIG_DMAR_FLPY_WA
+static inline void iommu_prepare_isa(void)
+{
+   return;
+}
+#endif /* !CONFIG_DMAR_FLPY_WA */
+
 #endif

-- 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[Intel IOMMU 08/10] DMAR fault handling support

2007-06-19 Thread Keshavamurthy, Anil S

MSI interrupt handler registrations and fault handling support
for Intel-IOMMU hadrware.

This patch enables the MSI interrupts for the DMA remapping units
and in the interrupt handler read the fault cause and outputs the
same on to the console.

Signed-off-by: Anil S Keshavamurthy <[EMAIL PROTECTED]>
---
 Documentation/Intel-IOMMU.txt |   17 +++
 arch/x86_64/kernel/io_apic.c  |   59 
 drivers/pci/intel-iommu.c |  194 ++
 include/linux/dmar.h  |   12 ++
 4 files changed, 281 insertions(+), 1 deletion(-)

Index: linux-2.6.22-rc4-mm2/Documentation/Intel-IOMMU.txt
===
--- linux-2.6.22-rc4-mm2.orig/Documentation/Intel-IOMMU.txt 2007-06-18 
15:45:46.0 -0700
+++ linux-2.6.22-rc4-mm2/Documentation/Intel-IOMMU.txt  2007-06-19 
13:05:03.0 -0700
@@ -63,6 +63,15 @@
 The same is true for peer to peer transactions. Hence we reserve the
 address from PCI MMIO ranges so they are not allocated for IOVA addresses.
 
+
+Fault reporting
+---
+When errors are reported, the DMA engine signals via an interrupt. The fault
+reason and device that caused it with fault reason is printed on console.
+
+See below for sample.
+
+
 Boot Message Sample
 ---
 
@@ -85,6 +94,14 @@
 
 PCI-DMA: Using DMAR IOMMU
 
+Fault reporting
+---
+
+DMAR:[DMA Write] Request device [00:02.0] fault addr 6df084000
+DMAR:[fault reason 05] PTE Write access is not set
+DMAR:[DMA Write] Request device [00:02.0] fault addr 6df084000
+DMAR:[fault reason 05] PTE Write access is not set
+
 TBD
 
 
Index: linux-2.6.22-rc4-mm2/arch/x86_64/kernel/io_apic.c
===
--- linux-2.6.22-rc4-mm2.orig/arch/x86_64/kernel/io_apic.c  2007-06-18 
15:45:38.0 -0700
+++ linux-2.6.22-rc4-mm2/arch/x86_64/kernel/io_apic.c   2007-06-18 
15:45:46.0 -0700
@@ -31,6 +31,7 @@
 #include 
 #include 
 #include 
+#include 
 #ifdef CONFIG_ACPI
 #include 
 #endif
@@ -2026,8 +2027,64 @@
destroy_irq(irq);
 }
 
-#endif /* CONFIG_PCI_MSI */
+#ifdef CONFIG_DMAR
+#ifdef CONFIG_SMP
+static void dmar_msi_set_affinity(unsigned int irq, cpumask_t mask)
+{
+   struct irq_cfg *cfg = irq_cfg + irq;
+   struct msi_msg msg;
+   unsigned int dest;
+   cpumask_t tmp;
+
+   cpus_and(tmp, mask, cpu_online_map);
+   if (cpus_empty(tmp))
+   return;
+
+   if (assign_irq_vector(irq, mask))
+   return;
+
+   cpus_and(tmp, cfg->domain, mask);
+   dest = cpu_mask_to_apicid(tmp);
+
+   dmar_msi_read(irq, );
+
+   msg.data &= ~MSI_DATA_VECTOR_MASK;
+   msg.data |= MSI_DATA_VECTOR(cfg->vector);
+   msg.address_lo &= ~MSI_ADDR_DEST_ID_MASK;
+   msg.address_lo |= MSI_ADDR_DEST_ID(dest);
+
+   dmar_msi_write(irq, );
+   irq_desc[irq].affinity = mask;
+}
+#endif /* CONFIG_SMP */
+
+struct irq_chip dmar_msi_type = {
+   .name = "DMAR_MSI",
+   .unmask = dmar_msi_unmask,
+   .mask = dmar_msi_mask,
+   .ack = ack_apic_edge,
+#ifdef CONFIG_SMP
+   .set_affinity = dmar_msi_set_affinity,
+#endif
+   .retrigger = ioapic_retrigger_irq,
+};
+
+int arch_setup_dmar_msi(unsigned int irq)
+{
+   int ret;
+   struct msi_msg msg;
+
+   ret = msi_compose_msg(NULL, irq, );
+   if (ret < 0)
+   return ret;
+   dmar_msi_write(irq, );
+   set_irq_chip_and_handler_name(irq, _msi_type, handle_edge_irq,
+   "edge");
+   return 0;
+}
+#endif
 
+#endif /* CONFIG_PCI_MSI */
 /*
  * Hypertransport interrupt support
  */
Index: linux-2.6.22-rc4-mm2/drivers/pci/intel-iommu.c
===
--- linux-2.6.22-rc4-mm2.orig/drivers/pci/intel-iommu.c 2007-06-18 
15:45:46.0 -0700
+++ linux-2.6.22-rc4-mm2/drivers/pci/intel-iommu.c  2007-06-19 
13:05:03.0 -0700
@@ -742,6 +742,196 @@
return 0;
 }
 
+/* iommu interrupt handling. Most stuff are MSI-like. */
+
+static char *fault_reason_strings[] =
+{
+   "Software",
+   "Present bit in root entry is clear",
+   "Present bit in context entry is clear",
+   "Invalid context entry",
+   "Access beyond MGAW",
+   "PTE Write access is not set",
+   "PTE Read access is not set",
+   "Next page table ptr is invalid",
+   "Root table address invalid",
+   "Context table ptr is invalid",
+   "non-zero reserved fields in RTP",
+   "non-zero reserved fields in CTP",
+   "non-zero reserved fields in PTE",
+   "Unknown"
+};
+#define MAX_FAULT_REASON_IDX   ARRAY_SIZE(fault_reason_strings)
+
+char *dmar_get_fault_reason(u8 fault_reason)
+{
+   if (fault_reason > MAX_FAULT_REASON_IDX)
+   return fault_reason_strings[MAX_FAULT_REASON_IDX];
+   else
+   return fault_reason_strings[fault_reason];
+}
+
+void

[Intel IOMMU 04/10] IOVA allocation and management routines

2007-06-19 Thread Keshavamurthy, Anil S

This code implements a generic IOVA allocation and 
management. As per Dave's suggestion we are now allocating
IO virtual address from Higher DMA limit address rather
than lower end address and this eliminated the need to preserve
the IO virtual address for multiple devices sharing the same
domain virtual address.

Also this code uses red black trees to store the allocated and
reserved iova nodes. This showed a good performance improvements
over previous linear linked list.

Changes from previous posting:
1) Fixed mostly coding style issues

Signed-off-by: Anil S Keshavamurthy <[EMAIL PROTECTED]>
---
 drivers/pci/iova.c |  351 +
 drivers/pci/iova.h |   62 +
 2 files changed, 413 insertions(+)

Index: linux-2.6.22-rc4-mm2/drivers/pci/iova.c
===
--- /dev/null   1970-01-01 00:00:00.0 +
+++ linux-2.6.22-rc4-mm2/drivers/pci/iova.c 2007-06-18 15:45:46.0 
-0700
@@ -0,0 +1,351 @@
+/*
+ * Copyright (c) 2006, Intel Corporation.
+ *
+ * This file is released under the GPLv2.
+ *
+ * Copyright (C) 2006 Anil S Keshavamurthy <[EMAIL PROTECTED]>
+ */
+
+#include "iova.h"
+
+void
+init_iova_domain(struct iova_domain *iovad)
+{
+   spin_lock_init(>iova_alloc_lock);
+   spin_lock_init(>iova_rbtree_lock);
+   iovad->rbroot = RB_ROOT;
+   iovad->cached32_node = NULL;
+
+}
+
+static struct rb_node *
+__get_cached_rbnode(struct iova_domain *iovad, unsigned long *limit_pfn)
+{
+   if ((*limit_pfn != DMA_32BIT_PFN) ||
+   (iovad->cached32_node == NULL))
+   return rb_last(>rbroot);
+   else {
+   struct rb_node *prev_node = rb_prev(iovad->cached32_node);
+   struct iova *curr_iova =
+   container_of(iovad->cached32_node, struct iova, node);
+   *limit_pfn = curr_iova->pfn_lo - 1;
+   return prev_node;
+   }
+}
+
+static inline void
+__cached_rbnode_insert_update(struct iova_domain *iovad,
+   unsigned long limit_pfn, struct iova *new)
+{
+   if (limit_pfn != DMA_32BIT_PFN)
+   return;
+   iovad->cached32_node = >node;
+}
+
+static inline void
+__cached_rbnode_delete_update(struct iova_domain *iovad, struct iova *free)
+{
+   struct iova *cached_iova;
+   struct rb_node *curr;
+
+   if (!iovad->cached32_node)
+   return;
+   curr = iovad->cached32_node;
+   cached_iova = container_of(curr, struct iova, node);
+
+   if (free->pfn_lo >= cached_iova->pfn_lo)
+   iovad->cached32_node = rb_next(>node);
+}
+
+static inline int __alloc_iova_range(struct iova_domain *iovad,
+   unsigned long size, unsigned long limit_pfn, struct iova *new)
+{
+   struct rb_node *curr = NULL;
+   unsigned long flags;
+   unsigned long saved_pfn;
+
+   /* Walk the tree backwards */
+   spin_lock_irqsave(>iova_rbtree_lock, flags);
+   saved_pfn = limit_pfn;
+   curr = __get_cached_rbnode(iovad, _pfn);
+   while (curr) {
+   struct iova *curr_iova = container_of(curr, struct iova, node);
+   if (limit_pfn < curr_iova->pfn_lo)
+   goto move_left;
+   if (limit_pfn < curr_iova->pfn_hi)
+   goto adjust_limit_pfn;
+   if ((curr_iova->pfn_hi + size) <= limit_pfn)
+   break;  /* found a free slot */
+adjust_limit_pfn:
+   limit_pfn = curr_iova->pfn_lo - 1;
+move_left:
+   curr = rb_prev(curr);
+   }
+
+   if ((!curr) && !(IOVA_START_PFN + size <= limit_pfn)) {
+   spin_unlock_irqrestore(>iova_rbtree_lock, flags);
+   return -ENOMEM;
+   }
+   new->pfn_hi = limit_pfn;
+   new->pfn_lo = limit_pfn - size + 1;
+
+   spin_unlock_irqrestore(>iova_rbtree_lock, flags);
+   return 0;
+}
+
+static void
+iova_insert_rbtree(struct rb_root *root, struct iova *iova)
+{
+   struct rb_node **new = &(root->rb_node), *parent = NULL;
+   /* Figure out where to put new node */
+   while (*new) {
+   struct iova *this = container_of(*new, struct iova, node);
+   parent = *new;
+
+   if (iova->pfn_lo < this->pfn_lo)
+   new = &((*new)->rb_left);
+   else if (iova->pfn_lo > this->pfn_lo)
+   new = &((*new)->rb_right);
+   else
+   BUG(); /* this should not happen */
+   }
+   /* Add new node and rebalance tree. */
+   rb_link_node(>node, parent, new);
+   rb_insert_color(>node, root);
+}
+
+/**
+ * alloc_iova - allocates an iova
+ * @iovad - iova domain in question
+ * @size - size of page frames to allocate
+ * @limit_pfn - max limit address
+ * This function allocates an iova in the range limit_pfn to IOVA_START_PFN
+ * looking from limit_pfn instead from IOVA_START_PFN.
+ */

[Intel IOMMU 09/10] Iommu Gfx workaround

2007-06-19 Thread Keshavamurthy, Anil S

When we fix all the opensource gfx drivers to use the DMA api's,
at that time we can yank this config options out.

Signed-off-by: Anil S Keshavamurthy <[EMAIL PROTECTED]>
---
 Documentation/Intel-IOMMU.txt |5 +
 arch/x86_64/Kconfig   |   11 +++
 arch/x86_64/kernel/e820.c |   19 +++
 drivers/pci/intel-iommu.c |   33 +
 drivers/pci/intel-iommu.h |7 +++
 5 files changed, 75 insertions(+)

Index: linux-2.6.22-rc4-mm2/Documentation/Intel-IOMMU.txt
===
--- linux-2.6.22-rc4-mm2.orig/Documentation/Intel-IOMMU.txt 2007-06-18 
15:45:08.0 -0700
+++ linux-2.6.22-rc4-mm2/Documentation/Intel-IOMMU.txt  2007-06-18 
15:45:08.0 -0700
@@ -57,6 +57,11 @@
 If you encounter issues with graphics devices, you can try adding
 option intel_iommu=igfx_off to turn off the integrated graphics engine.
 
+If it happens to be a PCI device included in the INCLUDE_ALL Engine,
+then try enabling CONFIG_DMAR_GFX_WA to setup a 1-1 map. We hear
+graphics drivers may be in process of using DMA api's in the near
+future and at that time this option can be yanked out.
+
 Some exceptions to IOVA
 ---
 Interrupt ranges are not address translated, (0xfee0 - 0xfeef).
Index: linux-2.6.22-rc4-mm2/arch/x86_64/Kconfig
===
--- linux-2.6.22-rc4-mm2.orig/arch/x86_64/Kconfig   2007-06-18 
15:45:07.0 -0700
+++ linux-2.6.22-rc4-mm2/arch/x86_64/Kconfig2007-06-18 15:45:08.0 
-0700
@@ -741,6 +741,17 @@
  and includes pci device scope covered by these DMA
  remapping device.
 
+config DMAR_GFX_WA
+   bool "Support for Graphics workaround"
+   depends on DMAR
+   default y
+   help
+Current Graphics drivers tend to use physical address
+for DMA and avoid using DMA api's. Setting this config
+option permits the IOMMU driver to set a unity map for
+all the OS visible memory. Hence the driver can continue
+to use physical addresses for DMA.
+
 source "drivers/pci/pcie/Kconfig"
 
 source "drivers/pci/Kconfig"
Index: linux-2.6.22-rc4-mm2/arch/x86_64/kernel/e820.c
===
--- linux-2.6.22-rc4-mm2.orig/arch/x86_64/kernel/e820.c 2007-06-18 
15:44:44.0 -0700
+++ linux-2.6.22-rc4-mm2/arch/x86_64/kernel/e820.c  2007-06-18 
15:45:08.0 -0700
@@ -723,3 +723,22 @@
printk(KERN_INFO "Allocating PCI resources starting at %lx (gap: 
%lx:%lx)\n",
pci_mem_start, gapstart, gapsize);
 }
+
+int __init arch_get_ram_range(int slot, u64 *addr, u64 *size)
+{
+   int i;
+
+   if (slot < 0 || slot >= e820.nr_map)
+   return -1;
+   for (i = slot; i < e820.nr_map; i++) {
+   if (e820.map[i].type != E820_RAM)
+   continue;
+   break;
+   }
+   if (i == e820.nr_map || e820.map[i].addr > (max_pfn << PAGE_SHIFT))
+   return -1;
+   *addr = e820.map[i].addr;
+   *size = min_t(u64, e820.map[i].size + e820.map[i].addr,
+   max_pfn << PAGE_SHIFT) - *addr;
+   return i + 1;
+}
Index: linux-2.6.22-rc4-mm2/drivers/pci/intel-iommu.c
===
--- linux-2.6.22-rc4-mm2.orig/drivers/pci/intel-iommu.c 2007-06-18 
15:45:08.0 -0700
+++ linux-2.6.22-rc4-mm2/drivers/pci/intel-iommu.c  2007-06-18 
15:45:08.0 -0700
@@ -1601,6 +1601,36 @@
rmrr->end_address + 1);
 }
 
+#ifdef CONFIG_DMAR_GFX_WA
+extern int arch_get_ram_range(int slot, u64 *addr, u64 *size);
+static void __init iommu_prepare_gfx_mapping(void)
+{
+   struct pci_dev *pdev = NULL;
+   u64 base, size;
+   int slot;
+   int ret;
+
+   for_each_pci_dev(pdev) {
+   if (pdev->sysdata == DUMMY_DEVICE_DOMAIN_INFO ||
+   !IS_GFX_DEVICE(pdev))
+   continue;
+   printk(KERN_INFO "IOMMU: gfx device %s 1-1 mapping\n",
+   pci_name(pdev));
+   slot = arch_get_ram_range(0, , );
+   while (slot >= 0) {
+   ret = iommu_prepare_identity_map(pdev,
+   base, base + size);
+   if (ret)
+   goto error;
+   slot = arch_get_ram_range(slot, , );
+   }
+   continue;
+error:
+   printk(KERN_ERR "IOMMU: mapping reserved region failed\n");
+   }
+}
+#endif
+
 int __init init_dmars(void)
 {
struct dmar_drhd_unit *drhd;
@@ -1664,6 +1694,8 @@
}
}
 
+   iommu_prepare_gfx_mapping();
+
/*
 * for each drhd
 *   enable fault log
@@ -2175,3 +2207,4 @@
dma_ops =

[Intel IOMMU 07/10] Intel iommu cmdline option - forcedac

2007-06-19 Thread Keshavamurthy, Anil S

Introduce intel_iommu=forcedac commandline option.
This option is helpful to verify the pci device capability
of handling physical dma'able address greater than 4G.

Signed-off-by: Anil S Keshavamurthy <[EMAIL PROTECTED]>
---
 Documentation/kernel-parameters.txt |7 +++
 drivers/pci/intel-iommu.c   |7 ++-
 2 files changed, 13 insertions(+), 1 deletion(-)

Index: linux-2.6.22-rc4-mm2/Documentation/kernel-parameters.txt
===
--- linux-2.6.22-rc4-mm2.orig/Documentation/kernel-parameters.txt   
2007-06-18 15:45:46.0 -0700
+++ linux-2.6.22-rc4-mm2/Documentation/kernel-parameters.txt2007-06-18 
15:45:46.0 -0700
@@ -788,6 +788,13 @@
bypassed by not enabling DMAR with this option. In
this case, gfx device will use physical address for
DMA.
+   forcedac [x86_64]
+   With this option iommu will not optimize to look
+   for io virtual address below 32 bit forcing dual
+   address cycle on pci bus for cards supporting greater
+   than 32 bit addressing. The default is to look
+   for translation below 32 bit and if not available
+   then look in the higher range.
 
io7=[HW] IO7 for Marvel based alpha systems
See comment before marvel_specify_io7 in
Index: linux-2.6.22-rc4-mm2/drivers/pci/intel-iommu.c
===
--- linux-2.6.22-rc4-mm2.orig/drivers/pci/intel-iommu.c 2007-06-18 
15:45:46.0 -0700
+++ linux-2.6.22-rc4-mm2/drivers/pci/intel-iommu.c  2007-06-19 
13:09:06.0 -0700
@@ -53,6 +53,7 @@
 
 static int dmar_disabled;
 static int __initdata dmar_map_gfx = 1;
+static int dmar_forcedac;
 
 #define DUMMY_DEVICE_DOMAIN_INFO ((struct device_domain_info *)(-1))
 static DEFINE_SPINLOCK(device_domain_lock);
@@ -70,6 +71,10 @@
dmar_map_gfx = 0;
printk(KERN_INFO
"Intel-IOMMU: disable GFX device mapping\n");
+   } else if (!strncmp(str, "forcedac", 8)) {
+   printk (KERN_INFO
+   "Intel-IOMMU: Forcing DAC for PCI devices\n");
+   dmar_forcedac = 1;
}
 
str += strcspn(str, ",");
@@ -1557,7 +1562,7 @@
 
start_addr = IOVA_START_ADDR;
 
-   if (pdev->dma_mask <= DMA_32BIT_MASK) {
+   if ((pdev->dma_mask <= DMA_32BIT_MASK) || (dmar_forcedac)) {
iova = iommu_alloc_iova(domain, addr, size, start_addr,
pdev->dma_mask);
} else  {

-- 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[Intel IOMMU 03/10] clflush_cache_range now takes size param

2007-06-19 Thread Keshavamurthy, Anil S

Introduce the size param for clflush_cache_range().

Signed-off-by: Anil S Keshavamurthy <[EMAIL PROTECTED]>

---
 arch/x86_64/mm/pageattr.c   |6 +++---
 include/asm-x86_64/cacheflush.h |1 +
 2 files changed, 4 insertions(+), 3 deletions(-)

Index: linux-2.6.22-rc4-mm2/arch/x86_64/mm/pageattr.c
===
--- linux-2.6.22-rc4-mm2.orig/arch/x86_64/mm/pageattr.c 2007-06-18 
15:45:39.0 -0700
+++ linux-2.6.22-rc4-mm2/arch/x86_64/mm/pageattr.c  2007-06-18 
15:45:46.0 -0700
@@ -61,10 +61,10 @@
return base;
 } 
 
-static void cache_flush_page(void *adr)
+void clflush_cache_range(void *adr, int size)
 {
int i;
-   for (i = 0; i < PAGE_SIZE; i += boot_cpu_data.x86_clflush_size)
+   for (i = 0; i < size; i += boot_cpu_data.x86_clflush_size)
asm volatile("clflush (%0)" :: "r" (adr + i));
 }
 
@@ -80,7 +80,7 @@
list_for_each_entry(pg, l, lru) {
void *adr = page_address(pg);
if (cpu_has_clflush)
-   cache_flush_page(adr);
+   clflush_cache_range(adr, PAGE_SIZE);
}
__flush_tlb_all();
 }
Index: linux-2.6.22-rc4-mm2/include/asm-x86_64/cacheflush.h
===
--- linux-2.6.22-rc4-mm2.orig/include/asm-x86_64/cacheflush.h   2007-06-18 
15:45:39.0 -0700
+++ linux-2.6.22-rc4-mm2/include/asm-x86_64/cacheflush.h2007-06-18 
15:45:46.0 -0700
@@ -27,6 +27,7 @@
 void global_flush_tlb(void); 
 int change_page_attr(struct page *page, int numpages, pgprot_t prot);
 int change_page_attr_addr(unsigned long addr, int numpages, pgprot_t prot);
+void clflush_cache_range(void *addr, int size);
 
 #ifdef CONFIG_DEBUG_RODATA
 void mark_rodata_ro(void);

-- 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[Intel IOMMU 06/10] Avoid memory allocation failures in dma map api calls

2007-06-19 Thread Keshavamurthy, Anil S

Intel IOMMU driver needs memory during DMA map calls to setup its internal
page tables and for other data structures. As we all know that these DMA 
map calls are mostly called in the interrupt context or with the spinlock 
held by the upper level drivers(network/storage drivers), so in order to 
avoid any memory allocation failure due to low memory issues,
this patch makes memory allocation by temporarily setting PF_MEMALLOC
flags for the current task before making memory allocation calls.

We evaluated mempools as a backup when kmem_cache_alloc() fails
and found that mempools are really not useful here because
 1) We don;t know for sure how much to reserve in advance
 2) And mempools are not useful for GFP_ATOMIC case (as we call 
memory alloc functions with GFP_ATOMIC)


With PF_MEMALLOC flag set in the current->flags, the VM subsystem avoids
any watermark checks before allocating memory thus guarantee'ing the 
memory till the last free page. Further, looking at the code in 
mm/page_alloc.c in __alloc_pages() function, looks like this
flag is useful only in the non-interrupt context.

If we are in the interrupt context and memory allocation in IOMMU
driver fails for some reason, then the DMA map api's will return 
failure and it is up to the higher level drivers to retry. Suppose,
if upper level driver programs the controller with the buggy 
DMA virtual address, the IOMMU will block that DMA transaction
when that happens thus preventing any corruption to main memory.

So far in our test scenario, we were unable to create
any memory allocation failure inside dma map api calls.

Signed-off-by: Anil S Keshavamurthy <[EMAIL PROTECTED]>

---
 drivers/pci/intel-iommu.c |   30 ++
 1 file changed, 26 insertions(+), 4 deletions(-)

Index: linux-2.6.22-rc4-mm2/drivers/pci/intel-iommu.c
===
--- linux-2.6.22-rc4-mm2.orig/drivers/pci/intel-iommu.c 2007-06-18 
15:45:46.0 -0700
+++ linux-2.6.22-rc4-mm2/drivers/pci/intel-iommu.c  2007-06-19 
13:10:29.0 -0700
@@ -84,9 +84,31 @@
 static struct kmem_cache *iommu_devinfo_cache;
 static struct kmem_cache *iommu_iova_cache;
 
+static inline void *iommu_kmem_cache_alloc(struct kmem_cache *cachep)
+{
+   unsigned int flags;
+   void *vaddr;
+
+   /* trying to avoid low memory issues */
+   flags = current->flags & PF_MEMALLOC;
+   current->flags |= PF_MEMALLOC;
+   vaddr = kmem_cache_alloc(cachep, GFP_ATOMIC);
+   current->flags &= (~PF_MEMALLOC | flags);
+   return vaddr;
+}
+
+
 static inline void *alloc_pgtable_page(void)
 {
-   return (void *)get_zeroed_page(GFP_ATOMIC);
+   unsigned int flags;
+   void *vaddr;
+
+   /* trying to avoid low memory issues */
+   flags = current->flags & PF_MEMALLOC;
+   current->flags |= PF_MEMALLOC;
+   vaddr = (void *)get_zeroed_page(GFP_ATOMIC);
+   current->flags &= (~PF_MEMALLOC | flags);
+   return vaddr;
 }
 
 static inline void free_pgtable_page(void *vaddr)
@@ -96,7 +118,7 @@
 
 static inline void *alloc_domain_mem(void)
 {
-   return kmem_cache_alloc(iommu_domain_cache, GFP_ATOMIC);
+   return iommu_kmem_cache_alloc(iommu_domain_cache);
 }
 
 static inline void free_domain_mem(void *vaddr)
@@ -106,7 +128,7 @@
 
 static inline void * alloc_devinfo_mem(void)
 {
-   return kmem_cache_alloc(iommu_devinfo_cache, GFP_ATOMIC);
+   return iommu_kmem_cache_alloc(iommu_devinfo_cache);
 }
 
 static inline void free_devinfo_mem(void *vaddr)
@@ -116,7 +138,7 @@
 
 struct iova *alloc_iova_mem(void)
 {
-   return kmem_cache_alloc(iommu_iova_cache, GFP_ATOMIC);
+   return iommu_kmem_cache_alloc(iommu_iova_cache);
 }
 
 void free_iova_mem(struct iova *iova)

-- 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[Intel IOMMU 01/10] DMAR detection and parsing logic

2007-06-19 Thread Keshavamurthy, Anil S

This patch adds support for early detection and parsing of DMAR's
(DMA Remapping ) reported to OS via ACPI tables.

DMA remapping(DMAR) devices support enables independent address
translations for Direct Memory Access(DMA) from Devices.
These DMA remapping devices are reported via ACPI tables
and includes pci device scope covered by these DMA
remapping device.

For detailed info on the specification of  "Intel(R) Virtualization 
Technology for Directed I/O Architecture" please see
http://www.intel.com/technology/virtualization/index.htm

Signed-off-by: Anil S Keshavamurthy <[EMAIL PROTECTED]>

---
 arch/x86_64/Kconfig   |   11 +
 drivers/pci/Makefile  |3 
 drivers/pci/dmar.c|  327 ++
 include/acpi/actbl1.h |   27 +++-
 include/linux/dmar.h  |   52 +++
 5 files changed, 413 insertions(+), 7 deletions(-)

Index: linux-2.6.22-rc4-mm2/arch/x86_64/Kconfig
===
--- linux-2.6.22-rc4-mm2.orig/arch/x86_64/Kconfig   2007-06-18 
15:45:39.0 -0700
+++ linux-2.6.22-rc4-mm2/arch/x86_64/Kconfig2007-06-19 13:05:03.0 
-0700
@@ -730,6 +730,17 @@
bool "Support mmconfig PCI config space access"
depends on PCI && ACPI
 
+config DMAR
+   bool "Support for DMA Remapping Devices (EXPERIMENTAL)"
+   depends on PCI_MSI && ACPI && EXPERIMENTAL
+   default y
+   help
+ DMA remapping(DMAR) devices support enables independent address
+ translations for Direct Memory Access(DMA) from Devices.
+ These DMA remapping devices are reported via ACPI tables
+ and includes pci device scope covered by these DMA
+ remapping device.
+
 source "drivers/pci/pcie/Kconfig"
 
 source "drivers/pci/Kconfig"
Index: linux-2.6.22-rc4-mm2/drivers/pci/Makefile
===
--- linux-2.6.22-rc4-mm2.orig/drivers/pci/Makefile  2007-06-18 
15:45:39.0 -0700
+++ linux-2.6.22-rc4-mm2/drivers/pci/Makefile   2007-06-19 13:43:14.0 
-0700
@@ -20,6 +20,9 @@
 # Build the Hypertransport interrupt support
 obj-$(CONFIG_HT_IRQ) += htirq.o
 
+# Build Intel IOMMU support
+obj-$(CONFIG_DMAR) += dmar.o
+
 #
 # Some architectures use the generic PCI setup functions
 #
Index: linux-2.6.22-rc4-mm2/drivers/pci/dmar.c
===
--- /dev/null   1970-01-01 00:00:00.0 +
+++ linux-2.6.22-rc4-mm2/drivers/pci/dmar.c 2007-06-18 15:45:46.0 
-0700
@@ -0,0 +1,327 @@
+/*
+ * Copyright (c) 2006, Intel Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program; if not, write to the Free Software Foundation, Inc., 59 Temple
+ * Place - Suite 330, Boston, MA 02111-1307 USA.
+ *
+ * Copyright (C) Ashok Raj <[EMAIL PROTECTED]>
+ * Copyright (C) Shaohua Li <[EMAIL PROTECTED]>
+ * Copyright (C) Anil S Keshavamurthy <[EMAIL PROTECTED]>
+ *
+ * This file implements early detection/parsing of DMA Remapping Devices
+ * reported to OS through BIOS via DMA remapping reporting (DMAR) ACPI
+ * tables.
+ */
+
+#include 
+#include 
+
+#undef PREFIX
+#define PREFIX "DMAR:"
+
+/* No locks are needed as DMA remapping hardware unit
+ * list is constructed at boot time and hotplug of
+ * these units are not supported by the architecture.
+ */
+LIST_HEAD(dmar_drhd_units);
+LIST_HEAD(dmar_rmrr_units);
+
+static struct acpi_table_header * __initdata dmar_tbl;
+
+static void __init dmar_register_drhd_unit(struct dmar_drhd_unit *drhd)
+{
+   /*
+* add INCLUDE_ALL at the tail, so scan the list will find it at
+* the very end.
+*/
+   if (drhd->include_all)
+   list_add_tail(>list, _drhd_units);
+   else
+   list_add(>list, _drhd_units);
+}
+
+static void __init dmar_register_rmrr_unit(struct dmar_rmrr_unit *rmrr)
+{
+   list_add(>list, _rmrr_units);
+}
+
+static int __init dmar_parse_one_dev_scope(struct acpi_dmar_device_scope 
*scope,
+  struct pci_dev **dev, u16 segment)
+{
+   struct pci_bus *bus;
+   struct pci_dev *pdev = NULL;
+   struct acpi_dmar_pci_path *path;
+   int count;
+
+   bus = pci_find_bus(segment, scope->bus);
+   path = (struct acpi_dmar_pci_path *)(scope + 1);
+   count = (scope->length - sizeof(struct acpi_dmar_device_scope))
+   / sizeof(struct acpi_dmar_pci_path);
+

[Intel IOMMU 02/10] PCI generic helper function

2007-06-19 Thread Keshavamurthy, Anil S

When devices are under a p2p bridge, upstream
transactions get replaced by the device id of the bridge as it owns the
PCIE transaction. Hence its necessary to setup translations on behalf of the
bridge as well. Due to this limitation all devices under a p2p share the same
domain in a DMAR.

We just cache the type of device, if its a native PCIe device
or not for later use.

changes from previous posting
1) fixed mostly coding style issues nothing major

Signed-off-by: Anil S Keshavamurthy <[EMAIL PROTECTED]>
---
 drivers/pci/pci.h|1 +
 drivers/pci/probe.c  |   14 ++
 drivers/pci/search.c |   30 ++
 include/linux/pci.h  |2 ++
 4 files changed, 47 insertions(+)

Index: linux-2.6.22-rc4-mm2/drivers/pci/pci.h
===
--- linux-2.6.22-rc4-mm2.orig/drivers/pci/pci.h 2007-06-18 15:44:45.0 
-0700
+++ linux-2.6.22-rc4-mm2/drivers/pci/pci.h  2007-06-18 15:45:07.0 
-0700
@@ -92,3 +92,4 @@
return NULL;
 }
 
+struct pci_dev *pci_find_upstream_pcie_bridge(struct pci_dev *pdev);
Index: linux-2.6.22-rc4-mm2/drivers/pci/probe.c
===
--- linux-2.6.22-rc4-mm2.orig/drivers/pci/probe.c   2007-06-18 
15:44:45.0 -0700
+++ linux-2.6.22-rc4-mm2/drivers/pci/probe.c2007-06-18 15:45:07.0 
-0700
@@ -851,6 +851,19 @@
kfree(pci_dev);
 }
 
+static void set_pcie_port_type(struct pci_dev *pdev)
+{
+   int pos;
+   u16 reg16;
+
+   pos = pci_find_capability(pdev, PCI_CAP_ID_EXP);
+   if (!pos)
+   return;
+   pdev->is_pcie = 1;
+   pci_read_config_word(pdev, pos + PCI_EXP_FLAGS, );
+   pdev->pcie_type = (reg16 & PCI_EXP_FLAGS_TYPE) >> 4;
+}
+
 /**
  * pci_cfg_space_size - get the configuration space size of the PCI device.
  * @dev: PCI device
@@ -965,6 +978,7 @@
dev->device = (l >> 16) & 0x;
dev->cfg_size = pci_cfg_space_size(dev);
dev->error_state = pci_channel_io_normal;
+   set_pcie_port_type(dev);
 
/* Assume 32-bit PCI; let 64-bit PCI cards (which are far rarer)
   set this higher, assuming the system even supports it.  */
Index: linux-2.6.22-rc4-mm2/drivers/pci/search.c
===
--- linux-2.6.22-rc4-mm2.orig/drivers/pci/search.c  2007-06-18 
15:44:45.0 -0700
+++ linux-2.6.22-rc4-mm2/drivers/pci/search.c   2007-06-18 15:45:07.0 
-0700
@@ -14,6 +14,36 @@
 #include "pci.h"
 
 DECLARE_RWSEM(pci_bus_sem);
+/*
+ * find the upstream PCIE-to-PCI bridge of a PCI device
+ * if the device is PCIE, return NULL
+ * if the device isn't connected to a PCIE bridge (that is its parent is a
+ * legacy PCI bridge and the bridge is directly connected to bus 0), return its
+ * parent
+ */
+struct pci_dev *
+pci_find_upstream_pcie_bridge(struct pci_dev *pdev)
+{
+   struct pci_dev *tmp = NULL;
+
+   if (pdev->is_pcie)
+   return NULL;
+   while (1) {
+   if (!pdev->bus->self)
+   break;
+   pdev = pdev->bus->self;
+   /* a p2p bridge */
+   if (!pdev->is_pcie) {
+   tmp = pdev;
+   continue;
+   }
+   /* PCI device should connect to a PCIE bridge */
+   BUG_ON(pdev->pcie_type != PCI_EXP_TYPE_PCI_BRIDGE);
+   return pdev;
+   }
+
+   return tmp;
+}
 
 static struct pci_bus *pci_do_find_bus(struct pci_bus *bus, unsigned char 
busnr)
 {
Index: linux-2.6.22-rc4-mm2/include/linux/pci.h
===
--- linux-2.6.22-rc4-mm2.orig/include/linux/pci.h   2007-06-18 
15:44:45.0 -0700
+++ linux-2.6.22-rc4-mm2/include/linux/pci.h2007-06-18 15:45:07.0 
-0700
@@ -140,6 +140,7 @@
unsigned short  subsystem_device;
unsigned intclass;  /* 3 bytes: (base,sub,prog-if) */
u8  hdr_type;   /* PCI header type (`multi' flag masked 
out) */
+   u8  pcie_type;  /* PCI-E device/port type */
u8  rom_base_reg;   /* which config register controls the 
ROM */
u8  pin;/* which interrupt pin this device uses 
*/
 
@@ -182,6 +183,7 @@
unsigned intmsi_enabled:1;
unsigned intmsix_enabled:1;
unsigned intis_managed:1;
+   unsigned intis_pcie:1;
atomic_tenable_cnt; /* pci_enable_device has been called */
 
u32 saved_config_space[16]; /* config space saved at 
suspend time */

-- 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[Intel IOMMU 00/10] Intel IOMMU support, take #2

2007-06-19 Thread Keshavamurthy, Anil S

Hi All,
This patch supports the upcomming Intel IOMMU hardware
a.k.a. Intel(R) Virtualization Technology for Directed I/O 
Architecture and the hardware spec for the same can be found here
http://www.intel.com/technology/virtualization/index.htm

This version of the patches incorporates several 
feedback obtained from previous postings.

Some of the major changes are
1) Removed resource pool (a.k.a. pre-allocate pool) patch
2) For memory allocation in the DMA map api calls we
   now use kmem_cache_alloc() and get_zeroedpage() functions
   to allocate memory for internal data structures and for 
   page table setup memory.
3) The allocation of memory in the DMA map api calls is 
   very critical and to avoid failures during memory allocation
   in the DMA map api calls we evaluated several technique
   a) mempool - We found that mempool is pretty much useless
  if we try to allocate memory with GFP_ATOMIC which is
 our case. Also we found that it is difficult to judge
 how much to reserver during the creation of mempool.
   b) PF_MEMALLOC - When a task flags (current->flags) are
 set with PF_MEMALLOC then watermark checks are avoided
 during the memory allocation.
  We choose to use the latter (option b) and make this as
  a separate patch which can be debated further. Please
  see patch 6/10.

Other minor changes are mostly coding style fixes and 
making sure that checkpatch.pl passes the patches.

Please include this set of patches for next MM release.

Thanks and regards,
-Anil S Keshavamurthy
E-mail: [EMAIL PROTECTED]

-- 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[Intel IOMMU 00/10] Intel IOMMU support, take #2

2007-06-19 Thread Keshavamurthy, Anil S

Hi All,
This patch supports the upcomming Intel IOMMU hardware
a.k.a. Intel(R) Virtualization Technology for Directed I/O 
Architecture and the hardware spec for the same can be found here
http://www.intel.com/technology/virtualization/index.htm

This version of the patches incorporates several 
feedback obtained from previous postings.

Some of the major changes are
1) Removed resource pool (a.k.a. pre-allocate pool) patch
2) For memory allocation in the DMA map api calls we
   now use kmem_cache_alloc() and get_zeroedpage() functions
   to allocate memory for internal data structures and for 
   page table setup memory.
3) The allocation of memory in the DMA map api calls is 
   very critical and to avoid failures during memory allocation
   in the DMA map api calls we evaluated several technique
   a) mempool - We found that mempool is pretty much useless
  if we try to allocate memory with GFP_ATOMIC which is
 our case. Also we found that it is difficult to judge
 how much to reserver during the creation of mempool.
   b) PF_MEMALLOC - When a task flags (current-flags) are
 set with PF_MEMALLOC then watermark checks are avoided
 during the memory allocation.
  We choose to use the latter (option b) and make this as
  a separate patch which can be debated further. Please
  see patch 6/10.

Other minor changes are mostly coding style fixes and 
making sure that checkpatch.pl passes the patches.

Please include this set of patches for next MM release.

Thanks and regards,
-Anil S Keshavamurthy
E-mail: [EMAIL PROTECTED]

-- 
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[Intel IOMMU 01/10] DMAR detection and parsing logic

2007-06-19 Thread Keshavamurthy, Anil S

This patch adds support for early detection and parsing of DMAR's
(DMA Remapping ) reported to OS via ACPI tables.

DMA remapping(DMAR) devices support enables independent address
translations for Direct Memory Access(DMA) from Devices.
These DMA remapping devices are reported via ACPI tables
and includes pci device scope covered by these DMA
remapping device.

For detailed info on the specification of  Intel(R) Virtualization 
Technology for Directed I/O Architecture please see
http://www.intel.com/technology/virtualization/index.htm

Signed-off-by: Anil S Keshavamurthy [EMAIL PROTECTED]

---
 arch/x86_64/Kconfig   |   11 +
 drivers/pci/Makefile  |3 
 drivers/pci/dmar.c|  327 ++
 include/acpi/actbl1.h |   27 +++-
 include/linux/dmar.h  |   52 +++
 5 files changed, 413 insertions(+), 7 deletions(-)

Index: linux-2.6.22-rc4-mm2/arch/x86_64/Kconfig
===
--- linux-2.6.22-rc4-mm2.orig/arch/x86_64/Kconfig   2007-06-18 
15:45:39.0 -0700
+++ linux-2.6.22-rc4-mm2/arch/x86_64/Kconfig2007-06-19 13:05:03.0 
-0700
@@ -730,6 +730,17 @@
bool Support mmconfig PCI config space access
depends on PCI  ACPI
 
+config DMAR
+   bool Support for DMA Remapping Devices (EXPERIMENTAL)
+   depends on PCI_MSI  ACPI  EXPERIMENTAL
+   default y
+   help
+ DMA remapping(DMAR) devices support enables independent address
+ translations for Direct Memory Access(DMA) from Devices.
+ These DMA remapping devices are reported via ACPI tables
+ and includes pci device scope covered by these DMA
+ remapping device.
+
 source drivers/pci/pcie/Kconfig
 
 source drivers/pci/Kconfig
Index: linux-2.6.22-rc4-mm2/drivers/pci/Makefile
===
--- linux-2.6.22-rc4-mm2.orig/drivers/pci/Makefile  2007-06-18 
15:45:39.0 -0700
+++ linux-2.6.22-rc4-mm2/drivers/pci/Makefile   2007-06-19 13:43:14.0 
-0700
@@ -20,6 +20,9 @@
 # Build the Hypertransport interrupt support
 obj-$(CONFIG_HT_IRQ) += htirq.o
 
+# Build Intel IOMMU support
+obj-$(CONFIG_DMAR) += dmar.o
+
 #
 # Some architectures use the generic PCI setup functions
 #
Index: linux-2.6.22-rc4-mm2/drivers/pci/dmar.c
===
--- /dev/null   1970-01-01 00:00:00.0 +
+++ linux-2.6.22-rc4-mm2/drivers/pci/dmar.c 2007-06-18 15:45:46.0 
-0700
@@ -0,0 +1,327 @@
+/*
+ * Copyright (c) 2006, Intel Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program; if not, write to the Free Software Foundation, Inc., 59 Temple
+ * Place - Suite 330, Boston, MA 02111-1307 USA.
+ *
+ * Copyright (C) Ashok Raj [EMAIL PROTECTED]
+ * Copyright (C) Shaohua Li [EMAIL PROTECTED]
+ * Copyright (C) Anil S Keshavamurthy [EMAIL PROTECTED]
+ *
+ * This file implements early detection/parsing of DMA Remapping Devices
+ * reported to OS through BIOS via DMA remapping reporting (DMAR) ACPI
+ * tables.
+ */
+
+#include linux/pci.h
+#include linux/dmar.h
+
+#undef PREFIX
+#define PREFIX DMAR:
+
+/* No locks are needed as DMA remapping hardware unit
+ * list is constructed at boot time and hotplug of
+ * these units are not supported by the architecture.
+ */
+LIST_HEAD(dmar_drhd_units);
+LIST_HEAD(dmar_rmrr_units);
+
+static struct acpi_table_header * __initdata dmar_tbl;
+
+static void __init dmar_register_drhd_unit(struct dmar_drhd_unit *drhd)
+{
+   /*
+* add INCLUDE_ALL at the tail, so scan the list will find it at
+* the very end.
+*/
+   if (drhd-include_all)
+   list_add_tail(drhd-list, dmar_drhd_units);
+   else
+   list_add(drhd-list, dmar_drhd_units);
+}
+
+static void __init dmar_register_rmrr_unit(struct dmar_rmrr_unit *rmrr)
+{
+   list_add(rmrr-list, dmar_rmrr_units);
+}
+
+static int __init dmar_parse_one_dev_scope(struct acpi_dmar_device_scope 
*scope,
+  struct pci_dev **dev, u16 segment)
+{
+   struct pci_bus *bus;
+   struct pci_dev *pdev = NULL;
+   struct acpi_dmar_pci_path *path;
+   int count;
+
+   bus = pci_find_bus(segment, scope-bus);
+   path = (struct acpi_dmar_pci_path *)(scope + 1);
+   count = (scope-length - sizeof(struct acpi_dmar_device_scope))
+   / sizeof(struct

[Intel IOMMU 02/10] PCI generic helper function

2007-06-19 Thread Keshavamurthy, Anil S

When devices are under a p2p bridge, upstream
transactions get replaced by the device id of the bridge as it owns the
PCIE transaction. Hence its necessary to setup translations on behalf of the
bridge as well. Due to this limitation all devices under a p2p share the same
domain in a DMAR.

We just cache the type of device, if its a native PCIe device
or not for later use.

changes from previous posting
1) fixed mostly coding style issues nothing major

Signed-off-by: Anil S Keshavamurthy [EMAIL PROTECTED]
---
 drivers/pci/pci.h|1 +
 drivers/pci/probe.c  |   14 ++
 drivers/pci/search.c |   30 ++
 include/linux/pci.h  |2 ++
 4 files changed, 47 insertions(+)

Index: linux-2.6.22-rc4-mm2/drivers/pci/pci.h
===
--- linux-2.6.22-rc4-mm2.orig/drivers/pci/pci.h 2007-06-18 15:44:45.0 
-0700
+++ linux-2.6.22-rc4-mm2/drivers/pci/pci.h  2007-06-18 15:45:07.0 
-0700
@@ -92,3 +92,4 @@
return NULL;
 }
 
+struct pci_dev *pci_find_upstream_pcie_bridge(struct pci_dev *pdev);
Index: linux-2.6.22-rc4-mm2/drivers/pci/probe.c
===
--- linux-2.6.22-rc4-mm2.orig/drivers/pci/probe.c   2007-06-18 
15:44:45.0 -0700
+++ linux-2.6.22-rc4-mm2/drivers/pci/probe.c2007-06-18 15:45:07.0 
-0700
@@ -851,6 +851,19 @@
kfree(pci_dev);
 }
 
+static void set_pcie_port_type(struct pci_dev *pdev)
+{
+   int pos;
+   u16 reg16;
+
+   pos = pci_find_capability(pdev, PCI_CAP_ID_EXP);
+   if (!pos)
+   return;
+   pdev-is_pcie = 1;
+   pci_read_config_word(pdev, pos + PCI_EXP_FLAGS, reg16);
+   pdev-pcie_type = (reg16  PCI_EXP_FLAGS_TYPE)  4;
+}
+
 /**
  * pci_cfg_space_size - get the configuration space size of the PCI device.
  * @dev: PCI device
@@ -965,6 +978,7 @@
dev-device = (l  16)  0x;
dev-cfg_size = pci_cfg_space_size(dev);
dev-error_state = pci_channel_io_normal;
+   set_pcie_port_type(dev);
 
/* Assume 32-bit PCI; let 64-bit PCI cards (which are far rarer)
   set this higher, assuming the system even supports it.  */
Index: linux-2.6.22-rc4-mm2/drivers/pci/search.c
===
--- linux-2.6.22-rc4-mm2.orig/drivers/pci/search.c  2007-06-18 
15:44:45.0 -0700
+++ linux-2.6.22-rc4-mm2/drivers/pci/search.c   2007-06-18 15:45:07.0 
-0700
@@ -14,6 +14,36 @@
 #include pci.h
 
 DECLARE_RWSEM(pci_bus_sem);
+/*
+ * find the upstream PCIE-to-PCI bridge of a PCI device
+ * if the device is PCIE, return NULL
+ * if the device isn't connected to a PCIE bridge (that is its parent is a
+ * legacy PCI bridge and the bridge is directly connected to bus 0), return its
+ * parent
+ */
+struct pci_dev *
+pci_find_upstream_pcie_bridge(struct pci_dev *pdev)
+{
+   struct pci_dev *tmp = NULL;
+
+   if (pdev-is_pcie)
+   return NULL;
+   while (1) {
+   if (!pdev-bus-self)
+   break;
+   pdev = pdev-bus-self;
+   /* a p2p bridge */
+   if (!pdev-is_pcie) {
+   tmp = pdev;
+   continue;
+   }
+   /* PCI device should connect to a PCIE bridge */
+   BUG_ON(pdev-pcie_type != PCI_EXP_TYPE_PCI_BRIDGE);
+   return pdev;
+   }
+
+   return tmp;
+}
 
 static struct pci_bus *pci_do_find_bus(struct pci_bus *bus, unsigned char 
busnr)
 {
Index: linux-2.6.22-rc4-mm2/include/linux/pci.h
===
--- linux-2.6.22-rc4-mm2.orig/include/linux/pci.h   2007-06-18 
15:44:45.0 -0700
+++ linux-2.6.22-rc4-mm2/include/linux/pci.h2007-06-18 15:45:07.0 
-0700
@@ -140,6 +140,7 @@
unsigned short  subsystem_device;
unsigned intclass;  /* 3 bytes: (base,sub,prog-if) */
u8  hdr_type;   /* PCI header type (`multi' flag masked 
out) */
+   u8  pcie_type;  /* PCI-E device/port type */
u8  rom_base_reg;   /* which config register controls the 
ROM */
u8  pin;/* which interrupt pin this device uses 
*/
 
@@ -182,6 +183,7 @@
unsigned intmsi_enabled:1;
unsigned intmsix_enabled:1;
unsigned intis_managed:1;
+   unsigned intis_pcie:1;
atomic_tenable_cnt; /* pci_enable_device has been called */
 
u32 saved_config_space[16]; /* config space saved at 
suspend time */

-- 
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[Intel IOMMU 06/10] Avoid memory allocation failures in dma map api calls

2007-06-19 Thread Keshavamurthy, Anil S

Intel IOMMU driver needs memory during DMA map calls to setup its internal
page tables and for other data structures. As we all know that these DMA 
map calls are mostly called in the interrupt context or with the spinlock 
held by the upper level drivers(network/storage drivers), so in order to 
avoid any memory allocation failure due to low memory issues,
this patch makes memory allocation by temporarily setting PF_MEMALLOC
flags for the current task before making memory allocation calls.

We evaluated mempools as a backup when kmem_cache_alloc() fails
and found that mempools are really not useful here because
 1) We don;t know for sure how much to reserve in advance
 2) And mempools are not useful for GFP_ATOMIC case (as we call 
memory alloc functions with GFP_ATOMIC)


With PF_MEMALLOC flag set in the current-flags, the VM subsystem avoids
any watermark checks before allocating memory thus guarantee'ing the 
memory till the last free page. Further, looking at the code in 
mm/page_alloc.c in __alloc_pages() function, looks like this
flag is useful only in the non-interrupt context.

If we are in the interrupt context and memory allocation in IOMMU
driver fails for some reason, then the DMA map api's will return 
failure and it is up to the higher level drivers to retry. Suppose,
if upper level driver programs the controller with the buggy 
DMA virtual address, the IOMMU will block that DMA transaction
when that happens thus preventing any corruption to main memory.

So far in our test scenario, we were unable to create
any memory allocation failure inside dma map api calls.

Signed-off-by: Anil S Keshavamurthy [EMAIL PROTECTED]

---
 drivers/pci/intel-iommu.c |   30 ++
 1 file changed, 26 insertions(+), 4 deletions(-)

Index: linux-2.6.22-rc4-mm2/drivers/pci/intel-iommu.c
===
--- linux-2.6.22-rc4-mm2.orig/drivers/pci/intel-iommu.c 2007-06-18 
15:45:46.0 -0700
+++ linux-2.6.22-rc4-mm2/drivers/pci/intel-iommu.c  2007-06-19 
13:10:29.0 -0700
@@ -84,9 +84,31 @@
 static struct kmem_cache *iommu_devinfo_cache;
 static struct kmem_cache *iommu_iova_cache;
 
+static inline void *iommu_kmem_cache_alloc(struct kmem_cache *cachep)
+{
+   unsigned int flags;
+   void *vaddr;
+
+   /* trying to avoid low memory issues */
+   flags = current-flags  PF_MEMALLOC;
+   current-flags |= PF_MEMALLOC;
+   vaddr = kmem_cache_alloc(cachep, GFP_ATOMIC);
+   current-flags = (~PF_MEMALLOC | flags);
+   return vaddr;
+}
+
+
 static inline void *alloc_pgtable_page(void)
 {
-   return (void *)get_zeroed_page(GFP_ATOMIC);
+   unsigned int flags;
+   void *vaddr;
+
+   /* trying to avoid low memory issues */
+   flags = current-flags  PF_MEMALLOC;
+   current-flags |= PF_MEMALLOC;
+   vaddr = (void *)get_zeroed_page(GFP_ATOMIC);
+   current-flags = (~PF_MEMALLOC | flags);
+   return vaddr;
 }
 
 static inline void free_pgtable_page(void *vaddr)
@@ -96,7 +118,7 @@
 
 static inline void *alloc_domain_mem(void)
 {
-   return kmem_cache_alloc(iommu_domain_cache, GFP_ATOMIC);
+   return iommu_kmem_cache_alloc(iommu_domain_cache);
 }
 
 static inline void free_domain_mem(void *vaddr)
@@ -106,7 +128,7 @@
 
 static inline void * alloc_devinfo_mem(void)
 {
-   return kmem_cache_alloc(iommu_devinfo_cache, GFP_ATOMIC);
+   return iommu_kmem_cache_alloc(iommu_devinfo_cache);
 }
 
 static inline void free_devinfo_mem(void *vaddr)
@@ -116,7 +138,7 @@
 
 struct iova *alloc_iova_mem(void)
 {
-   return kmem_cache_alloc(iommu_iova_cache, GFP_ATOMIC);
+   return iommu_kmem_cache_alloc(iommu_iova_cache);
 }
 
 void free_iova_mem(struct iova *iova)

-- 
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[Intel IOMMU 07/10] Intel iommu cmdline option - forcedac

2007-06-19 Thread Keshavamurthy, Anil S

Introduce intel_iommu=forcedac commandline option.
This option is helpful to verify the pci device capability
of handling physical dma'able address greater than 4G.

Signed-off-by: Anil S Keshavamurthy [EMAIL PROTECTED]
---
 Documentation/kernel-parameters.txt |7 +++
 drivers/pci/intel-iommu.c   |7 ++-
 2 files changed, 13 insertions(+), 1 deletion(-)

Index: linux-2.6.22-rc4-mm2/Documentation/kernel-parameters.txt
===
--- linux-2.6.22-rc4-mm2.orig/Documentation/kernel-parameters.txt   
2007-06-18 15:45:46.0 -0700
+++ linux-2.6.22-rc4-mm2/Documentation/kernel-parameters.txt2007-06-18 
15:45:46.0 -0700
@@ -788,6 +788,13 @@
bypassed by not enabling DMAR with this option. In
this case, gfx device will use physical address for
DMA.
+   forcedac [x86_64]
+   With this option iommu will not optimize to look
+   for io virtual address below 32 bit forcing dual
+   address cycle on pci bus for cards supporting greater
+   than 32 bit addressing. The default is to look
+   for translation below 32 bit and if not available
+   then look in the higher range.
 
io7=[HW] IO7 for Marvel based alpha systems
See comment before marvel_specify_io7 in
Index: linux-2.6.22-rc4-mm2/drivers/pci/intel-iommu.c
===
--- linux-2.6.22-rc4-mm2.orig/drivers/pci/intel-iommu.c 2007-06-18 
15:45:46.0 -0700
+++ linux-2.6.22-rc4-mm2/drivers/pci/intel-iommu.c  2007-06-19 
13:09:06.0 -0700
@@ -53,6 +53,7 @@
 
 static int dmar_disabled;
 static int __initdata dmar_map_gfx = 1;
+static int dmar_forcedac;
 
 #define DUMMY_DEVICE_DOMAIN_INFO ((struct device_domain_info *)(-1))
 static DEFINE_SPINLOCK(device_domain_lock);
@@ -70,6 +71,10 @@
dmar_map_gfx = 0;
printk(KERN_INFO
Intel-IOMMU: disable GFX device mapping\n);
+   } else if (!strncmp(str, forcedac, 8)) {
+   printk (KERN_INFO
+   Intel-IOMMU: Forcing DAC for PCI devices\n);
+   dmar_forcedac = 1;
}
 
str += strcspn(str, ,);
@@ -1557,7 +1562,7 @@
 
start_addr = IOVA_START_ADDR;
 
-   if (pdev-dma_mask = DMA_32BIT_MASK) {
+   if ((pdev-dma_mask = DMA_32BIT_MASK) || (dmar_forcedac)) {
iova = iommu_alloc_iova(domain, addr, size, start_addr,
pdev-dma_mask);
} else  {

-- 
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[Intel IOMMU 03/10] clflush_cache_range now takes size param

2007-06-19 Thread Keshavamurthy, Anil S

Introduce the size param for clflush_cache_range().

Signed-off-by: Anil S Keshavamurthy [EMAIL PROTECTED]

---
 arch/x86_64/mm/pageattr.c   |6 +++---
 include/asm-x86_64/cacheflush.h |1 +
 2 files changed, 4 insertions(+), 3 deletions(-)

Index: linux-2.6.22-rc4-mm2/arch/x86_64/mm/pageattr.c
===
--- linux-2.6.22-rc4-mm2.orig/arch/x86_64/mm/pageattr.c 2007-06-18 
15:45:39.0 -0700
+++ linux-2.6.22-rc4-mm2/arch/x86_64/mm/pageattr.c  2007-06-18 
15:45:46.0 -0700
@@ -61,10 +61,10 @@
return base;
 } 
 
-static void cache_flush_page(void *adr)
+void clflush_cache_range(void *adr, int size)
 {
int i;
-   for (i = 0; i  PAGE_SIZE; i += boot_cpu_data.x86_clflush_size)
+   for (i = 0; i  size; i += boot_cpu_data.x86_clflush_size)
asm volatile(clflush (%0) :: r (adr + i));
 }
 
@@ -80,7 +80,7 @@
list_for_each_entry(pg, l, lru) {
void *adr = page_address(pg);
if (cpu_has_clflush)
-   cache_flush_page(adr);
+   clflush_cache_range(adr, PAGE_SIZE);
}
__flush_tlb_all();
 }
Index: linux-2.6.22-rc4-mm2/include/asm-x86_64/cacheflush.h
===
--- linux-2.6.22-rc4-mm2.orig/include/asm-x86_64/cacheflush.h   2007-06-18 
15:45:39.0 -0700
+++ linux-2.6.22-rc4-mm2/include/asm-x86_64/cacheflush.h2007-06-18 
15:45:46.0 -0700
@@ -27,6 +27,7 @@
 void global_flush_tlb(void); 
 int change_page_attr(struct page *page, int numpages, pgprot_t prot);
 int change_page_attr_addr(unsigned long addr, int numpages, pgprot_t prot);
+void clflush_cache_range(void *addr, int size);
 
 #ifdef CONFIG_DEBUG_RODATA
 void mark_rodata_ro(void);

-- 
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[Intel IOMMU 09/10] Iommu Gfx workaround

2007-06-19 Thread Keshavamurthy, Anil S

When we fix all the opensource gfx drivers to use the DMA api's,
at that time we can yank this config options out.

Signed-off-by: Anil S Keshavamurthy [EMAIL PROTECTED]
---
 Documentation/Intel-IOMMU.txt |5 +
 arch/x86_64/Kconfig   |   11 +++
 arch/x86_64/kernel/e820.c |   19 +++
 drivers/pci/intel-iommu.c |   33 +
 drivers/pci/intel-iommu.h |7 +++
 5 files changed, 75 insertions(+)

Index: linux-2.6.22-rc4-mm2/Documentation/Intel-IOMMU.txt
===
--- linux-2.6.22-rc4-mm2.orig/Documentation/Intel-IOMMU.txt 2007-06-18 
15:45:08.0 -0700
+++ linux-2.6.22-rc4-mm2/Documentation/Intel-IOMMU.txt  2007-06-18 
15:45:08.0 -0700
@@ -57,6 +57,11 @@
 If you encounter issues with graphics devices, you can try adding
 option intel_iommu=igfx_off to turn off the integrated graphics engine.
 
+If it happens to be a PCI device included in the INCLUDE_ALL Engine,
+then try enabling CONFIG_DMAR_GFX_WA to setup a 1-1 map. We hear
+graphics drivers may be in process of using DMA api's in the near
+future and at that time this option can be yanked out.
+
 Some exceptions to IOVA
 ---
 Interrupt ranges are not address translated, (0xfee0 - 0xfeef).
Index: linux-2.6.22-rc4-mm2/arch/x86_64/Kconfig
===
--- linux-2.6.22-rc4-mm2.orig/arch/x86_64/Kconfig   2007-06-18 
15:45:07.0 -0700
+++ linux-2.6.22-rc4-mm2/arch/x86_64/Kconfig2007-06-18 15:45:08.0 
-0700
@@ -741,6 +741,17 @@
  and includes pci device scope covered by these DMA
  remapping device.
 
+config DMAR_GFX_WA
+   bool Support for Graphics workaround
+   depends on DMAR
+   default y
+   help
+Current Graphics drivers tend to use physical address
+for DMA and avoid using DMA api's. Setting this config
+option permits the IOMMU driver to set a unity map for
+all the OS visible memory. Hence the driver can continue
+to use physical addresses for DMA.
+
 source drivers/pci/pcie/Kconfig
 
 source drivers/pci/Kconfig
Index: linux-2.6.22-rc4-mm2/arch/x86_64/kernel/e820.c
===
--- linux-2.6.22-rc4-mm2.orig/arch/x86_64/kernel/e820.c 2007-06-18 
15:44:44.0 -0700
+++ linux-2.6.22-rc4-mm2/arch/x86_64/kernel/e820.c  2007-06-18 
15:45:08.0 -0700
@@ -723,3 +723,22 @@
printk(KERN_INFO Allocating PCI resources starting at %lx (gap: 
%lx:%lx)\n,
pci_mem_start, gapstart, gapsize);
 }
+
+int __init arch_get_ram_range(int slot, u64 *addr, u64 *size)
+{
+   int i;
+
+   if (slot  0 || slot = e820.nr_map)
+   return -1;
+   for (i = slot; i  e820.nr_map; i++) {
+   if (e820.map[i].type != E820_RAM)
+   continue;
+   break;
+   }
+   if (i == e820.nr_map || e820.map[i].addr  (max_pfn  PAGE_SHIFT))
+   return -1;
+   *addr = e820.map[i].addr;
+   *size = min_t(u64, e820.map[i].size + e820.map[i].addr,
+   max_pfn  PAGE_SHIFT) - *addr;
+   return i + 1;
+}
Index: linux-2.6.22-rc4-mm2/drivers/pci/intel-iommu.c
===
--- linux-2.6.22-rc4-mm2.orig/drivers/pci/intel-iommu.c 2007-06-18 
15:45:08.0 -0700
+++ linux-2.6.22-rc4-mm2/drivers/pci/intel-iommu.c  2007-06-18 
15:45:08.0 -0700
@@ -1601,6 +1601,36 @@
rmrr-end_address + 1);
 }
 
+#ifdef CONFIG_DMAR_GFX_WA
+extern int arch_get_ram_range(int slot, u64 *addr, u64 *size);
+static void __init iommu_prepare_gfx_mapping(void)
+{
+   struct pci_dev *pdev = NULL;
+   u64 base, size;
+   int slot;
+   int ret;
+
+   for_each_pci_dev(pdev) {
+   if (pdev-sysdata == DUMMY_DEVICE_DOMAIN_INFO ||
+   !IS_GFX_DEVICE(pdev))
+   continue;
+   printk(KERN_INFO IOMMU: gfx device %s 1-1 mapping\n,
+   pci_name(pdev));
+   slot = arch_get_ram_range(0, base, size);
+   while (slot = 0) {
+   ret = iommu_prepare_identity_map(pdev,
+   base, base + size);
+   if (ret)
+   goto error;
+   slot = arch_get_ram_range(slot, base, size);
+   }
+   continue;
+error:
+   printk(KERN_ERR IOMMU: mapping reserved region failed\n);
+   }
+}
+#endif
+
 int __init init_dmars(void)
 {
struct dmar_drhd_unit *drhd;
@@ -1664,6 +1694,8 @@
}
}
 
+   iommu_prepare_gfx_mapping();
+
/*
 * for each drhd
 *   enable fault log
@@ -2175,3 +2207,4 @@
dma_ops = intel_dma_ops;

[Intel IOMMU 04/10] IOVA allocation and management routines

2007-06-19 Thread Keshavamurthy, Anil S

This code implements a generic IOVA allocation and 
management. As per Dave's suggestion we are now allocating
IO virtual address from Higher DMA limit address rather
than lower end address and this eliminated the need to preserve
the IO virtual address for multiple devices sharing the same
domain virtual address.

Also this code uses red black trees to store the allocated and
reserved iova nodes. This showed a good performance improvements
over previous linear linked list.

Changes from previous posting:
1) Fixed mostly coding style issues

Signed-off-by: Anil S Keshavamurthy [EMAIL PROTECTED]
---
 drivers/pci/iova.c |  351 +
 drivers/pci/iova.h |   62 +
 2 files changed, 413 insertions(+)

Index: linux-2.6.22-rc4-mm2/drivers/pci/iova.c
===
--- /dev/null   1970-01-01 00:00:00.0 +
+++ linux-2.6.22-rc4-mm2/drivers/pci/iova.c 2007-06-18 15:45:46.0 
-0700
@@ -0,0 +1,351 @@
+/*
+ * Copyright (c) 2006, Intel Corporation.
+ *
+ * This file is released under the GPLv2.
+ *
+ * Copyright (C) 2006 Anil S Keshavamurthy [EMAIL PROTECTED]
+ */
+
+#include iova.h
+
+void
+init_iova_domain(struct iova_domain *iovad)
+{
+   spin_lock_init(iovad-iova_alloc_lock);
+   spin_lock_init(iovad-iova_rbtree_lock);
+   iovad-rbroot = RB_ROOT;
+   iovad-cached32_node = NULL;
+
+}
+
+static struct rb_node *
+__get_cached_rbnode(struct iova_domain *iovad, unsigned long *limit_pfn)
+{
+   if ((*limit_pfn != DMA_32BIT_PFN) ||
+   (iovad-cached32_node == NULL))
+   return rb_last(iovad-rbroot);
+   else {
+   struct rb_node *prev_node = rb_prev(iovad-cached32_node);
+   struct iova *curr_iova =
+   container_of(iovad-cached32_node, struct iova, node);
+   *limit_pfn = curr_iova-pfn_lo - 1;
+   return prev_node;
+   }
+}
+
+static inline void
+__cached_rbnode_insert_update(struct iova_domain *iovad,
+   unsigned long limit_pfn, struct iova *new)
+{
+   if (limit_pfn != DMA_32BIT_PFN)
+   return;
+   iovad-cached32_node = new-node;
+}
+
+static inline void
+__cached_rbnode_delete_update(struct iova_domain *iovad, struct iova *free)
+{
+   struct iova *cached_iova;
+   struct rb_node *curr;
+
+   if (!iovad-cached32_node)
+   return;
+   curr = iovad-cached32_node;
+   cached_iova = container_of(curr, struct iova, node);
+
+   if (free-pfn_lo = cached_iova-pfn_lo)
+   iovad-cached32_node = rb_next(free-node);
+}
+
+static inline int __alloc_iova_range(struct iova_domain *iovad,
+   unsigned long size, unsigned long limit_pfn, struct iova *new)
+{
+   struct rb_node *curr = NULL;
+   unsigned long flags;
+   unsigned long saved_pfn;
+
+   /* Walk the tree backwards */
+   spin_lock_irqsave(iovad-iova_rbtree_lock, flags);
+   saved_pfn = limit_pfn;
+   curr = __get_cached_rbnode(iovad, limit_pfn);
+   while (curr) {
+   struct iova *curr_iova = container_of(curr, struct iova, node);
+   if (limit_pfn  curr_iova-pfn_lo)
+   goto move_left;
+   if (limit_pfn  curr_iova-pfn_hi)
+   goto adjust_limit_pfn;
+   if ((curr_iova-pfn_hi + size) = limit_pfn)
+   break;  /* found a free slot */
+adjust_limit_pfn:
+   limit_pfn = curr_iova-pfn_lo - 1;
+move_left:
+   curr = rb_prev(curr);
+   }
+
+   if ((!curr)  !(IOVA_START_PFN + size = limit_pfn)) {
+   spin_unlock_irqrestore(iovad-iova_rbtree_lock, flags);
+   return -ENOMEM;
+   }
+   new-pfn_hi = limit_pfn;
+   new-pfn_lo = limit_pfn - size + 1;
+
+   spin_unlock_irqrestore(iovad-iova_rbtree_lock, flags);
+   return 0;
+}
+
+static void
+iova_insert_rbtree(struct rb_root *root, struct iova *iova)
+{
+   struct rb_node **new = (root-rb_node), *parent = NULL;
+   /* Figure out where to put new node */
+   while (*new) {
+   struct iova *this = container_of(*new, struct iova, node);
+   parent = *new;
+
+   if (iova-pfn_lo  this-pfn_lo)
+   new = ((*new)-rb_left);
+   else if (iova-pfn_lo  this-pfn_lo)
+   new = ((*new)-rb_right);
+   else
+   BUG(); /* this should not happen */
+   }
+   /* Add new node and rebalance tree. */
+   rb_link_node(iova-node, parent, new);
+   rb_insert_color(iova-node, root);
+}
+
+/**
+ * alloc_iova - allocates an iova
+ * @iovad - iova domain in question
+ * @size - size of page frames to allocate
+ * @limit_pfn - max limit address
+ * This function allocates an iova in the range limit_pfn to IOVA_START_PFN
+ * looking from limit_pfn instead from IOVA_START_PFN.
+ */

[Intel IOMMU 10/10] Iommu floppy workaround

2007-06-19 Thread Keshavamurthy, Anil S

This config option (DMAR_FLPY_WA) sets up 1:1 mapping for the
floppy device so that the floppy device which does not use
DMA api's will continue to work. 

Once the floppy driver starts using DMA api's this config option
can be turn off or this patch can be yanked out of kernel at that
time.


Signed-off-by: Anil S Keshavamurthy [EMAIL PROTECTED]
---
 arch/x86_64/Kconfig   |   10 ++
 drivers/pci/intel-iommu.c |   22 ++
 drivers/pci/intel-iommu.h |7 +++
 3 files changed, 39 insertions(+)

Index: linux-2.6.22-rc4-mm2/arch/x86_64/Kconfig
===
--- linux-2.6.22-rc4-mm2.orig/arch/x86_64/Kconfig   2007-06-18 
15:45:08.0 -0700
+++ linux-2.6.22-rc4-mm2/arch/x86_64/Kconfig2007-06-18 15:45:09.0 
-0700
@@ -752,6 +752,16 @@
 all the OS visible memory. Hence the driver can continue
 to use physical addresses for DMA.
 
+config DMAR_FLPY_WA
+   bool Support for Floppy disk workaround
+   depends on DMAR
+   default y
+   help
+Floppy disk drivers are know to by pass dma api calls
+their by failing to work when IOMMU is enabled. This
+work around will setup a 1 to 1 mappings for the first
+16M to make floppy(isa device) work.
+
 source drivers/pci/pcie/Kconfig
 
 source drivers/pci/Kconfig
Index: linux-2.6.22-rc4-mm2/drivers/pci/intel-iommu.c
===
--- linux-2.6.22-rc4-mm2.orig/drivers/pci/intel-iommu.c 2007-06-18 
15:45:08.0 -0700
+++ linux-2.6.22-rc4-mm2/drivers/pci/intel-iommu.c  2007-06-18 
15:45:09.0 -0700
@@ -1631,6 +1631,26 @@
 }
 #endif
 
+#ifdef CONFIG_DMAR_FLPY_WA
+static inline void iommu_prepare_isa(void)
+{
+   struct pci_dev *pdev = NULL;
+   int ret;
+
+   pdev = pci_get_class (PCI_CLASS_BRIDGE_ISA  8, NULL);
+   if (!pdev)
+   return;
+
+   printk (KERN_INFO IOMMU: Prepare 0-16M unity mapping for LPC\n);
+   ret = iommu_prepare_identity_map(pdev, 0, 16*1024*1024);
+
+   if (ret)
+   printk (IOMMU: Failed to create 0-64M identity map, \
+   Floppy might not work\n);
+
+}
+#endif
+
 int __init init_dmars(void)
 {
struct dmar_drhd_unit *drhd;
@@ -1696,6 +1716,8 @@
 
iommu_prepare_gfx_mapping();
 
+   iommu_prepare_isa();
+
/*
 * for each drhd
 *   enable fault log
Index: linux-2.6.22-rc4-mm2/drivers/pci/intel-iommu.h
===
--- linux-2.6.22-rc4-mm2.orig/drivers/pci/intel-iommu.h 2007-06-18 
15:45:08.0 -0700
+++ linux-2.6.22-rc4-mm2/drivers/pci/intel-iommu.h  2007-06-18 
15:45:09.0 -0700
@@ -322,4 +322,11 @@
 }
 #endif /* !CONFIG_DMAR_GFX_WA */
 
+#ifndef CONFIG_DMAR_FLPY_WA
+static inline void iommu_prepare_isa(void)
+{
+   return;
+}
+#endif /* !CONFIG_DMAR_FLPY_WA */
+
 #endif

-- 
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[Intel IOMMU 08/10] DMAR fault handling support

2007-06-19 Thread Keshavamurthy, Anil S

MSI interrupt handler registrations and fault handling support
for Intel-IOMMU hadrware.

This patch enables the MSI interrupts for the DMA remapping units
and in the interrupt handler read the fault cause and outputs the
same on to the console.

Signed-off-by: Anil S Keshavamurthy [EMAIL PROTECTED]
---
 Documentation/Intel-IOMMU.txt |   17 +++
 arch/x86_64/kernel/io_apic.c  |   59 
 drivers/pci/intel-iommu.c |  194 ++
 include/linux/dmar.h  |   12 ++
 4 files changed, 281 insertions(+), 1 deletion(-)

Index: linux-2.6.22-rc4-mm2/Documentation/Intel-IOMMU.txt
===
--- linux-2.6.22-rc4-mm2.orig/Documentation/Intel-IOMMU.txt 2007-06-18 
15:45:46.0 -0700
+++ linux-2.6.22-rc4-mm2/Documentation/Intel-IOMMU.txt  2007-06-19 
13:05:03.0 -0700
@@ -63,6 +63,15 @@
 The same is true for peer to peer transactions. Hence we reserve the
 address from PCI MMIO ranges so they are not allocated for IOVA addresses.
 
+
+Fault reporting
+---
+When errors are reported, the DMA engine signals via an interrupt. The fault
+reason and device that caused it with fault reason is printed on console.
+
+See below for sample.
+
+
 Boot Message Sample
 ---
 
@@ -85,6 +94,14 @@
 
 PCI-DMA: Using DMAR IOMMU
 
+Fault reporting
+---
+
+DMAR:[DMA Write] Request device [00:02.0] fault addr 6df084000
+DMAR:[fault reason 05] PTE Write access is not set
+DMAR:[DMA Write] Request device [00:02.0] fault addr 6df084000
+DMAR:[fault reason 05] PTE Write access is not set
+
 TBD
 
 
Index: linux-2.6.22-rc4-mm2/arch/x86_64/kernel/io_apic.c
===
--- linux-2.6.22-rc4-mm2.orig/arch/x86_64/kernel/io_apic.c  2007-06-18 
15:45:38.0 -0700
+++ linux-2.6.22-rc4-mm2/arch/x86_64/kernel/io_apic.c   2007-06-18 
15:45:46.0 -0700
@@ -31,6 +31,7 @@
 #include linux/sysdev.h
 #include linux/msi.h
 #include linux/htirq.h
+#include linux/dmar.h
 #ifdef CONFIG_ACPI
 #include acpi/acpi_bus.h
 #endif
@@ -2026,8 +2027,64 @@
destroy_irq(irq);
 }
 
-#endif /* CONFIG_PCI_MSI */
+#ifdef CONFIG_DMAR
+#ifdef CONFIG_SMP
+static void dmar_msi_set_affinity(unsigned int irq, cpumask_t mask)
+{
+   struct irq_cfg *cfg = irq_cfg + irq;
+   struct msi_msg msg;
+   unsigned int dest;
+   cpumask_t tmp;
+
+   cpus_and(tmp, mask, cpu_online_map);
+   if (cpus_empty(tmp))
+   return;
+
+   if (assign_irq_vector(irq, mask))
+   return;
+
+   cpus_and(tmp, cfg-domain, mask);
+   dest = cpu_mask_to_apicid(tmp);
+
+   dmar_msi_read(irq, msg);
+
+   msg.data = ~MSI_DATA_VECTOR_MASK;
+   msg.data |= MSI_DATA_VECTOR(cfg-vector);
+   msg.address_lo = ~MSI_ADDR_DEST_ID_MASK;
+   msg.address_lo |= MSI_ADDR_DEST_ID(dest);
+
+   dmar_msi_write(irq, msg);
+   irq_desc[irq].affinity = mask;
+}
+#endif /* CONFIG_SMP */
+
+struct irq_chip dmar_msi_type = {
+   .name = DMAR_MSI,
+   .unmask = dmar_msi_unmask,
+   .mask = dmar_msi_mask,
+   .ack = ack_apic_edge,
+#ifdef CONFIG_SMP
+   .set_affinity = dmar_msi_set_affinity,
+#endif
+   .retrigger = ioapic_retrigger_irq,
+};
+
+int arch_setup_dmar_msi(unsigned int irq)
+{
+   int ret;
+   struct msi_msg msg;
+
+   ret = msi_compose_msg(NULL, irq, msg);
+   if (ret  0)
+   return ret;
+   dmar_msi_write(irq, msg);
+   set_irq_chip_and_handler_name(irq, dmar_msi_type, handle_edge_irq,
+   edge);
+   return 0;
+}
+#endif
 
+#endif /* CONFIG_PCI_MSI */
 /*
  * Hypertransport interrupt support
  */
Index: linux-2.6.22-rc4-mm2/drivers/pci/intel-iommu.c
===
--- linux-2.6.22-rc4-mm2.orig/drivers/pci/intel-iommu.c 2007-06-18 
15:45:46.0 -0700
+++ linux-2.6.22-rc4-mm2/drivers/pci/intel-iommu.c  2007-06-19 
13:05:03.0 -0700
@@ -742,6 +742,196 @@
return 0;
 }
 
+/* iommu interrupt handling. Most stuff are MSI-like. */
+
+static char *fault_reason_strings[] =
+{
+   Software,
+   Present bit in root entry is clear,
+   Present bit in context entry is clear,
+   Invalid context entry,
+   Access beyond MGAW,
+   PTE Write access is not set,
+   PTE Read access is not set,
+   Next page table ptr is invalid,
+   Root table address invalid,
+   Context table ptr is invalid,
+   non-zero reserved fields in RTP,
+   non-zero reserved fields in CTP,
+   non-zero reserved fields in PTE,
+   Unknown
+};
+#define MAX_FAULT_REASON_IDX   ARRAY_SIZE(fault_reason_strings)
+
+char *dmar_get_fault_reason(u8 fault_reason)
+{
+   if (fault_reason  MAX_FAULT_REASON_IDX)
+   return fault_reason_strings[MAX_FAULT_REASON_IDX];
+   else
+   return

Re: [Intel IOMMU 05/10] Intel IOMMU driver

2007-06-19 Thread Keshavamurthy, Anil S

On Tue, Jun 19, 2007 at 04:32:23PM -0700, Christoph Lameter wrote:
 On Tue, 19 Jun 2007, Keshavamurthy, Anil S wrote:
 
  +static inline void *alloc_pgtable_page(void)
  +{
  +   return (void *)get_zeroed_page(GFP_ATOMIC);
  +}
 
 Need to pass gfp_t parameter. Repeates a couple of times.
 
  +   addr = (((u64)1)  addr_width) - 1;
  +   parent = domain-pgd;
  +
  +   spin_lock_irqsave(domain-mapping_lock, flags);
  +   while (level  0) {
  +   void *tmp_page;
  +
  +   offset = address_level_offset(addr, level);
  +   pte = parent[offset];
  +   if (level == 1)
  +   break;
  +
  +   if (!dma_pte_present(*pte)) {
  +   tmp_page = alloc_pgtable_page();
 
 Is it not possible here to drop the lock and do the alloc with GFP_KERNEL 
 and deal with the resulting race? That is done in other parts of the 
 kernel.
 
  +/* iommu handling */
  +static int iommu_alloc_root_entry(struct intel_iommu *iommu)
  +{
  +   struct root_entry *root;
  +   unsigned long flags;
  +
  +   root = (struct root_entry *)alloc_pgtable_page();
 
 This may be able to become a GFP_KERNEL alloc since interrupts are enabled 
 at this point?

Memory allocated during driver init is very less and not much benefit
with the suggested changes I think. Please correct me If I am wrong.

The biggest benifit will be when we can figure out GPF_ flags
during runtime when DMA map api's are called. 
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Intel-IOMMU 06/10] Intel IOMMU driver

2007-06-13 Thread Keshavamurthy, Anil S

On Thu, Jun 07, 2007 at 04:57:39PM -0700, Andrew Morton wrote:
> On Wed, 06 Jun 2007 11:57:04 -0700
> [EMAIL PROTECTED] wrote:
Andrew,
Most of your comments make sense, I will fix all of them and resend 
soon.
Thanks for your time taking a look at my patch.

-Anil
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Intel-IOMMU 06/10] Intel IOMMU driver

2007-06-13 Thread Keshavamurthy, Anil S

On Thu, Jun 07, 2007 at 04:57:39PM -0700, Andrew Morton wrote:
 On Wed, 06 Jun 2007 11:57:04 -0700
 [EMAIL PROTECTED] wrote:
Andrew,
Most of your comments make sense, I will fix all of them and resend 
soon.
Thanks for your time taking a look at my patch.

-Anil
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Intel-IOMMU 02/10] Library routine for pre-allocat pool handling

2007-06-11 Thread Keshavamurthy, Anil S

On Mon, Jun 11, 2007 at 02:14:49PM -0700, Andrew Morton wrote:
> On Mon, 11 Jun 2007 13:44:42 -0700
> "Keshavamurthy, Anil S" <[EMAIL PROTECTED]> wrote:
> 
> > In the first implementation of ours, we had used mempools api's to 
> > allocate memory and we were told that mempools with GFP_ATOMIC is
> > useless and hence in the second implementation we came up with
> > resource pools ( which is preallocate pools) and again as I understand
> > the argument is why create another when we have slab allocation which
> > is similar to this resource pools.
> 
> Odd.  mempool with GFP_ATOMIC is basically equivalent to your
> resource-pools, isn't it?: we'll try the slab allocator and if that failed,
> fall back to the reserves.

slab allocators don;t reserve the memory, in other words this memory 
can be consumed by VM under memory pressure which we don;t want in
IOMMU case.

Nope,they both are exactly opposite. 
mempool with GFP_ATOMIC, first tries to get memory from OS and
if that fails, it looks for the object in the pool and returns.

Where as resource pool is exactly opposite of mempool, where each 
time it looks for an object in the pool and if it exist then we 
return that object else we try to get the memory for OS while 
scheduling the work to grow the pool objects. In fact, the  work
is schedule to grow the pool when the low threshold point is hit.

Hence, I still feel resource pool implementation is best choice 
for IOMMU.

-Anil
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Intel-IOMMU 02/10] Library routine for pre-allocat pool handling

2007-06-11 Thread Keshavamurthy, Anil S

On Tue, Jun 12, 2007 at 12:25:57AM +0200, Andi Kleen wrote:
> 
> > Please advice.
> 
> I think the short term only safe option would be to fully preallocate an 
> aperture.
> If it is too small you can try GFP_ATOMIC but it would be just
> a unreliable fallback. For safety you could perhaps have some kernel thread
> that tries to enlarge it in the background depending on current
> use. That would be not 100% guaranteed to keep up with load,
> but would at least keep up if the system is not too busy.
> 
> That is basically what your resource pools do, but they seem
> to be unnecessarily convoluted for the task :- after all you
> could just preallocate the page tables and rewrite/flush them without
> having some kind of allocator inbetween, can't you?
Nope, it is not convoluted for the task. If you see carefully how
the IO virtual address is obtained, I am basically reusing the
the previous translated virutal address once it is freed instead
of going for continious IO virtural address. Because of this
reuse of IO virtual address, these address tend to map to the
same PAGE tables and hence the memory for page tables itself
does not grow unless there is that much IO going on in the System
where all entries in the page tables are full(which means that
much IO is in flight).

The only defect I see in the current resource pool is that
I am queuing the work on Keventd to grow the pool which could
be a problem as many other subsystem in the kernel depends
on keventd and as Anderew pointed out we could dead lock.
If we have a separate worker thread to grow the pool then
the deadlock issues is taken care.

I would still love to get my current resource pool (just fix the
keventd to separate thread to grow the pool) implementations 
to get into linus kernrl compared to kmem_cache_alloc implementation 
as I don;t see any benifit in moving to kmem_cache_alloc. But if 
people want I can provide kmem_cache_alloc implementation 
too just for comparisions. But this does not solve the fundamental 
problem that we have today.

So ideally for IOMMU's we should have some preallocated buffers
and if the buffers reach certain min_threshould the pool should
grow in the background and all of these features is in resource pool
implementation. Since we did not see any problems, can we
atleat try this resource pool implementation in the linux
MM kernels? If it is too bad, then I will change to 
kmem_cache_alloc() version. If this testing is OKAY, then
I will refresh my patch for the coding styles etc and 
resubmit with resource pool implementation. Andrew??

> 
> If you make the start value large enough (256+MB?) that might reasonably
> work. How much memory in page tables would that take? Or perhaps scale
> it with available memory or available devices. 

What you are suggesting is to prealloacate and setup the page tables at the
begining. But this would waste lot of memory because we don't know ahead of
time how large the page table setup should be and in future our hardware
can support 64K domains where each domain can dish out independent address
from its start to end address range. And pre-setup of tables for all of the
64K domains is not feasible.

> 
> In theory it could also be precomputed from the block/network device queue 
> lengths etc.; the trouble is just such checks would need to be added to all 
> kinds of 
> other odd subsystems that manage devices too.  That would be much more work.
> 
> Some investigation how to do sleeping block/network submit would be
> also interesting (e.g. replace the spinlocks there with mutexes and see how
> much it affects performance). For networking you would need to keep 
> at least a non sleeping path though because packets can be legally
> submitted from interrupt context. If it works out then sleeping
> interfaces to the IOMMU code could be added.

Yup, these investigations needs to happen and sooner the better for all and 
for general linux community.

> 
> -Andi
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

1 2 >

1 - 100 of 164 matches

Mail list logo