Re: [ewg] Re: [PATCH 2.6.31] ehca: Tolerate dynamic memory operations and huge pages

2009-06-17 Thread Alexander Schmidt
On Tue, 16 Jun 2009 09:10:39 -0700
Roland Dreier rdre...@cisco.com wrote:

 
   Yeah, the notifier code remains untouched as we still do not allow dynamic
   memory operations _while_ our module is loaded. The patch allows the 
 driver to
   cope with DMEM operations that happened before the module was loaded, which
   might result in a non-contiguous memory layout. When the driver registers
   its global memory region in the system, the memory layout must be 
 considered.
   
   We chose the term toleration instead of support to illustrate this.
 
 I see.  So things just silently broke in some cases when the driver was
 loaded after operations you didn't tolerate?
 
 Anyway, thanks for the explanation.

Well, things did not break silently. The registration of the MR failed with
an error code which was reported to userspace.

Will you push the patch for .31 or .32?

Thanks,
Alex
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH 2.6.31] ehca: Tolerate dynamic memory operations and huge pages

2009-06-16 Thread Alexander Schmidt
Hi Roland,

thank you for taking a look at the code!

On Fri, 12 Jun 2009 21:50:58 -0700
Roland Dreier rdre...@cisco.com wrote:

 OK, one major issue with this patch and a few minor nits.
 
 First, the major issue is that I don't see anything in the patch that
 changes the code in ehca_mem_notifier() in ehca_main.c:
 
   case MEM_GOING_ONLINE:
   case MEM_GOING_OFFLINE:
   /* only ok if no hca is attached to the lpar */
   spin_lock_irqsave(shca_list_lock, flags);
   if (list_empty(shca_list)) {
   spin_unlock_irqrestore(shca_list_lock, flags);
   return NOTIFY_OK;
   } else {
   spin_unlock_irqrestore(shca_list_lock, flags);
   if (printk_timed_ratelimit(ehca_dmem_warn_time,
  30 * 1000))
   ehca_gen_err(DMEM operations are not allowed
in conjunction with eHCA);
   return NOTIFY_BAD;
   }
 
 But your patch description says:
 
   This patch implements toleration of dynamic memory operations
 
 But it seems you're still going to hit the same NOTIFY_BAD case above
 after your patch.  So something doesn't compute for me.  Could you
 explain more?

Yeah, the notifier code remains untouched as we still do not allow dynamic
memory operations _while_ our module is loaded. The patch allows the driver to
cope with DMEM operations that happened before the module was loaded, which
might result in a non-contiguous memory layout. When the driver registers
its global memory region in the system, the memory layout must be considered.

We chose the term toleration instead of support to illustrate this.

I'll put some more details into the changelog, incorporate the other comments
and send out a second version of the patch.

Thanks,
Alex

 
 Second, a nit:
 
   +#define EHCA_REG_MR 0
   +#define EHCA_REG_BUSMAP_MR (~0)
 
 and you pass these as the reg_busmap parm in:
 
int ehca_reg_mr(struct ehca_shca *shca,
  struct ehca_mr *e_mr,
  u64 *iova_start,
   @@ -991,7 +1031,8 @@
  struct ehca_pd *e_pd,
  struct ehca_mr_pginfo *pginfo,
  u32 *lkey, /*OUT*/
   -  u32 *rkey) /*OUT*/
   +  u32 *rkey, /*OUT*/
   +  int reg_busmap)
 
 and test it as:
 
   +  if (reg_busmap)
   +  ret = ehca_reg_bmap_mr_rpages(shca, e_mr, pginfo);
   +  else
   +  ret = ehca_reg_mr_rpages(shca, e_mr, pginfo);
 
 So the ~0 for true looks a bit odd.  One option would be to make
 reg_busmap a bool, since that's how you're using it, but then you lose
 the nice self-documenting macro where you call things.
 
 So I think it would be cleaner to do something like
 
 enum ehca_reg_type {
   EHCA_REG_MR,
   EHCA_REG_BUSMAP_MR
 };
 
 and make the int reg_busmap parameter into enum ehca_reg_type reg_type
 and have the code become
 
 + if (reg_type == EHCA_REG_BUSMAP_MR)
 + ret = ehca_reg_bmap_mr_rpages(shca, e_mr, pginfo);
 + else if (reg_type == EHCA_REG_MR)
 + ret = ehca_reg_mr_rpages(shca, e_mr, pginfo);
 + else
 + ret = -EINVAL
 
 or something like that.
 
   +struct ib_dma_mapping_ops ehca_dma_mapping_ops = {
   +  .mapping_error = ehca_dma_mapping_error,
   +  .map_single = ehca_dma_map_single,
   +  .unmap_single = ehca_dma_unmap_single,
   +  .map_page = ehca_dma_map_page,
   +  .unmap_page = ehca_dma_unmap_page,
   +  .map_sg = ehca_dma_map_sg,
   +  .unmap_sg = ehca_dma_unmap_sg,
   +  .dma_address = ehca_dma_address,
   +  .dma_len = ehca_dma_len,
   +  .sync_single_for_cpu = ehca_dma_sync_single_for_cpu,
   +  .sync_single_for_device = ehca_dma_sync_single_for_device,
   +  .alloc_coherent = ehca_dma_alloc_coherent,
   +  .free_coherent = ehca_dma_free_coherent,
   +};
 
 I always think structures like this are easier to read if you align the
 '=' signs.  But no big deal.
 
   +  ret = ehca_create_busmap();
   +  if (ret) {
   +  ehca_gen_err(Cannot create busmap.);
   +  goto module_init2;
   +  }
   +
  ret = ibmebus_register_driver(ehca_driver);
  if (ret) {
  ehca_gen_err(Cannot register eHCA device driver);
  ret = -EINVAL;
   -  goto module_init2;
   +  goto module_init3;
  }

  ret = register_memory_notifier(ehca_mem_nb);
  if (ret) {
  ehca_gen_err(Failed registering memory add/remove notifier);
   -  goto module_init3;
   +  goto module_init4;
 
 Having to renumber unrelated things is when something changes is why I
 don't like this style of error path labels.  But I think it's well and
 truly too late to fix that in ehca.

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [ewg] Re: [PATCH 2.6.31] ehca: Tolerate dynamic memory operations and huge pages

2009-06-16 Thread Roland Dreier

  Yeah, the notifier code remains untouched as we still do not allow dynamic
  memory operations _while_ our module is loaded. The patch allows the driver 
  to
  cope with DMEM operations that happened before the module was loaded, which
  might result in a non-contiguous memory layout. When the driver registers
  its global memory region in the system, the memory layout must be considered.
  
  We chose the term toleration instead of support to illustrate this.

I see.  So things just silently broke in some cases when the driver was
loaded after operations you didn't tolerate?

Anyway, thanks for the explanation.
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH 2.6.31] ehca: Tolerate dynamic memory operations and huge pages

2009-06-12 Thread Roland Dreier
OK, one major issue with this patch and a few minor nits.

First, the major issue is that I don't see anything in the patch that
changes the code in ehca_mem_notifier() in ehca_main.c:

case MEM_GOING_ONLINE:
case MEM_GOING_OFFLINE:
/* only ok if no hca is attached to the lpar */
spin_lock_irqsave(shca_list_lock, flags);
if (list_empty(shca_list)) {
spin_unlock_irqrestore(shca_list_lock, flags);
return NOTIFY_OK;
} else {
spin_unlock_irqrestore(shca_list_lock, flags);
if (printk_timed_ratelimit(ehca_dmem_warn_time,
   30 * 1000))
ehca_gen_err(DMEM operations are not allowed
 in conjunction with eHCA);
return NOTIFY_BAD;
}

But your patch description says:

  This patch implements toleration of dynamic memory operations

But it seems you're still going to hit the same NOTIFY_BAD case above
after your patch.  So something doesn't compute for me.  Could you
explain more?

Second, a nit:

  +#define EHCA_REG_MR 0
  +#define EHCA_REG_BUSMAP_MR (~0)

and you pass these as the reg_busmap parm in:

   int ehca_reg_mr(struct ehca_shca *shca,
   struct ehca_mr *e_mr,
   u64 *iova_start,
  @@ -991,7 +1031,8 @@
   struct ehca_pd *e_pd,
   struct ehca_mr_pginfo *pginfo,
   u32 *lkey, /*OUT*/
  -u32 *rkey) /*OUT*/
  +u32 *rkey, /*OUT*/
  +int reg_busmap)

and test it as:

  +if (reg_busmap)
  +ret = ehca_reg_bmap_mr_rpages(shca, e_mr, pginfo);
  +else
  +ret = ehca_reg_mr_rpages(shca, e_mr, pginfo);

So the ~0 for true looks a bit odd.  One option would be to make
reg_busmap a bool, since that's how you're using it, but then you lose
the nice self-documenting macro where you call things.

So I think it would be cleaner to do something like

enum ehca_reg_type {
EHCA_REG_MR,
EHCA_REG_BUSMAP_MR
};

and make the int reg_busmap parameter into enum ehca_reg_type reg_type
and have the code become

+   if (reg_type == EHCA_REG_BUSMAP_MR)
+   ret = ehca_reg_bmap_mr_rpages(shca, e_mr, pginfo);
+   else if (reg_type == EHCA_REG_MR)
+   ret = ehca_reg_mr_rpages(shca, e_mr, pginfo);
+   else
+   ret = -EINVAL

or something like that.

  +struct ib_dma_mapping_ops ehca_dma_mapping_ops = {
  +.mapping_error = ehca_dma_mapping_error,
  +.map_single = ehca_dma_map_single,
  +.unmap_single = ehca_dma_unmap_single,
  +.map_page = ehca_dma_map_page,
  +.unmap_page = ehca_dma_unmap_page,
  +.map_sg = ehca_dma_map_sg,
  +.unmap_sg = ehca_dma_unmap_sg,
  +.dma_address = ehca_dma_address,
  +.dma_len = ehca_dma_len,
  +.sync_single_for_cpu = ehca_dma_sync_single_for_cpu,
  +.sync_single_for_device = ehca_dma_sync_single_for_device,
  +.alloc_coherent = ehca_dma_alloc_coherent,
  +.free_coherent = ehca_dma_free_coherent,
  +};

I always think structures like this are easier to read if you align the
'=' signs.  But no big deal.

  +ret = ehca_create_busmap();
  +if (ret) {
  +ehca_gen_err(Cannot create busmap.);
  +goto module_init2;
  +}
  +
   ret = ibmebus_register_driver(ehca_driver);
   if (ret) {
   ehca_gen_err(Cannot register eHCA device driver);
   ret = -EINVAL;
  -goto module_init2;
  +goto module_init3;
   }
   
   ret = register_memory_notifier(ehca_mem_nb);
   if (ret) {
   ehca_gen_err(Failed registering memory add/remove notifier);
  -goto module_init3;
  +goto module_init4;

Having to renumber unrelated things is when something changes is why I
don't like this style of error path labels.  But I think it's well and
truly too late to fix that in ehca.

 - R.
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH 2.6.31] ehca: Tolerate dynamic memory operations and huge pages

2009-06-10 Thread Hannes Hering
Hi Michael,

On Wednesday 10 June 2009 02:02:36 Michael Ellerman wrote:
 For those of us who haven't read the HEA spec lately, can you give us
 some more detail on that? :)
first of all, please note that this patch is actually for the ehca infiniband
driver. The ehca driver uses an internal memory region, which is supposed to
contain all physical memory. A memory region maps a virtually contiguous
adapter address space to the physical or better absolute address space. The
limitation is that the memory region cannot map non-contiguous virtual adapter
address space. However, on ppc64 machines there is a feature to dynamically add
or remove memory to logical partitions during runtime. These operations scatter
the absolute memory so that the kernel memory has a non-contiguous layout. This
layout cannot be represented in a memory region. The purpose of this code is to
detect the memory layout so that the memory region can be made up of the
existing memory chunks. It also translates the kernel addresses to the memory
region address, which is needed for interaction with the HCA.
 How does it interact with kexec/kdump?
We never tested the ehca driver with kexec/kdump. This patch also doesn't
improve anything in this context.

 phys_to_abs() ? As below, or does it come from somewhere else?
You're right, actually that isn't needed on System p. On the other hand I
needed to choose an address type, which is the base of all mapping. The
busmap holds all addresses as absolute addresses. The absolute addresses can
then be converted in any other type (virt, phys). 

Regards

Hannes

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH 2.6.31] ehca: Tolerate dynamic memory operations and huge pages

2009-06-09 Thread Michael Ellerman
On Tue, 2009-06-09 at 15:59 +0200, Hannes Hering wrote:
 This patch implements toleration of dynamic memory operations and 16 GB
 gigantic pages. On module load the driver walks through available system
 memory, checks for available memory ranges and then registers the kernel
 internal memory region accordingly. The translation of address ranges is
 implemented via a 3-level busmap.

Hi Hannes,

For those of us who haven't read the HEA spec lately, can you give us
some more detail on that? :)

How does it interact with kexec/kdump?

 +static int ehca_update_busmap(unsigned long pfn, unsigned long nr_pages)
 +{
 + unsigned long i, start_section, end_section;
 + int top, dir, idx;
 +
 + if (!nr_pages)
 + return 0;
 +
 + if (!ehca_bmap) {
 + ehca_bmap = kmalloc(sizeof(struct ehca_bmap), GFP_KERNEL);
 + if (!ehca_bmap)
 + return -ENOMEM;
 + /* Set map block to 0xFF according to EHCA_INVAL_ADDR */
 + memset(ehca_bmap, 0xFF, EHCA_TOP_MAP_SIZE);
 + }
 +
 + start_section = phys_to_abs(pfn * PAGE_SIZE) / EHCA_SECTSIZE;
 + end_section = phys_to_abs((pfn + nr_pages) * PAGE_SIZE) / EHCA_SECTSIZE;


phys_to_abs() ? As below, or does it come from somewhere else?

 arch/powerpc/include/asm/abs_addr.h:
 47 static inline unsigned long phys_to_abs(unsigned long pa)   
 48 {
 49 unsigned long chunk;
 50 
 51 /* This is a no-op on non-iSeries */
 52 if (!firmware_has_feature(FW_FEATURE_ISERIES))
 53 return pa;
 54 
 55 chunk = addr_to_chunk(pa);
 56 
 57 if (chunk  mschunks_map.num_chunks)
 58 chunk = mschunks_map.mapping[chunk];
 59 
 60 return chunk_to_addr(chunk) + (pa  MSCHUNKS_OFFSET_MASK);
 61 }


cheers


signature.asc
Description: This is a digitally signed message part
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev