Re: [PATCH 1/3] mpt2sas: remove the use of writeq, since writeq is not atomic

2011-05-18 Thread Roland Dreier
On Wed, May 18, 2011 at 11:31 AM, Milton Miller milt...@bga.com wrote:
 So the real question should be why is x86-32 supplying a broken writeq
 instead of letting drivers work out what to do it when needed?

Sounds a lot like what I was asking a couple of years ago :)
http://lkml.org/lkml/2009/4/19/164

But Ingo insisted that non-atomic writeq would be fine:
http://lkml.org/lkml/2009/4/19/167

 - R.
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH 25/37] drivers/infiniband: use .dev.of_node instead of .node in struct of_device

2010-03-11 Thread Roland Dreier
Seems fine... adding EHCA guys just to make sure.

  .node is being removed
  
  Signed-off-by: Grant Likely grant.lik...@secretlab.ca
  ---
  
   drivers/infiniband/hw/ehca/ehca_main.c |   11 ++-
   1 files changed, 6 insertions(+), 5 deletions(-)
  
  diff --git a/drivers/infiniband/hw/ehca/ehca_main.c 
  b/drivers/infiniband/hw/ehca/ehca_main.c
  index 129a6be..2639185 100644
  --- a/drivers/infiniband/hw/ehca/ehca_main.c
  +++ b/drivers/infiniband/hw/ehca/ehca_main.c
  @@ -291,8 +291,9 @@ static int ehca_sense_attributes(struct ehca_shca *shca)
   };
   
   ehca_gen_dbg(Probing adapter %s...,
  - shca-ofdev-node-full_name);
  -loc_code = of_get_property(shca-ofdev-node, ibm,loc-code, NULL);
  + shca-ofdev-dev.of_node-full_name);
  +loc_code = of_get_property(shca-ofdev-dev.of_node, ibm,loc-code,
  +   NULL);
   if (loc_code)
   ehca_gen_dbg( ... location lode=%s, loc_code);
   
  @@ -720,16 +721,16 @@ static int __devinit ehca_probe(struct of_device *dev,
   int ret, i, eq_size;
   unsigned long flags;
   
  -handle = of_get_property(dev-node, ibm,hca-handle, NULL);
  +handle = of_get_property(dev-dev.of_node, ibm,hca-handle, NULL);
   if (!handle) {
   ehca_gen_err(Cannot get eHCA handle for adapter: %s.,
  - dev-node-full_name);
  + dev-dev.of_node-full_name);
   return -ENODEV;
   }
   
   if (!(*handle)) {
   ehca_gen_err(Wrong eHCA handle for adapter: %s.,
  - dev-node-full_name);
  + dev-dev.of_node-full_name);
   return -ENODEV;
   }
   
  
  -- 
Roland Dreier  rola...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/index.html
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH 04/10] 8xx: Always pin kernel instruction TLB

2009-11-14 Thread Roland Dreier

   How to make better use of the remaining ITLB slots is tricky.
   Somehow one would want to map at lest one to modules but I cannot see how.
  
  No. If you use modules, you pay the price. Sane embedded solutions
  running in tight environments don't use modules :-) No point pinning
  TLB entries on the vmalloc space, really.

Long ago (2.4 days I think) when using modules on ppc 4xx we hacked the
module_alloc function (or whatever it was called back then) to allocate
space in the kernel pinned TLB instead of using vmalloc.  Gave something
like a 2x speedup for module code, since the 4xx TLB is so small and the
miss handling is so expensive.  I assume it should still be possible to
do a similar hack with current kernels.

 - R.
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH 4/6 v5] Memory DLPAR Handling

2009-11-05 Thread Roland Dreier
This isn't a review of this patch -- more a question out of curiousity
about how you actually can do memory remove in practice.  Do you have
any coordination between the platform/hypervisor and the kernel to make
sure that a memory region you might want to remove later gets put into
zone_movable so that there's a chance for the kernel to vacate it?  Or
do you have some other way to coordinate or at least expose which
regions of memory the kernel will have a chance at releasing?

Thanks,
  Roland
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH] IB/ehca: Fix CQE flags reporting

2009-09-01 Thread Roland Dreier
applied, thanks
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH] IB/ehca: Construct MAD redirect replies from request MAD

2009-08-31 Thread Roland Dreier
this seems reasonable to me, applied, thanks.
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [ewg] [PATCH] IB/ehca: Construct MAD redirect replies from request MAD

2009-08-28 Thread Roland Dreier

  Given that you seem to like the rest of the code and Jason hasn't spoken 
  up yet, I think we can have Roland merge this patch. Roland, what do you 
  think?

I don't see any problem with the idea and this does sound like a step
forward, so I am planning on merging this (pending review).
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH] ehca: use port autodetect mode as default

2009-07-08 Thread Roland Dreier
looks good, applied
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [ewg] Re: [PATCH 2.6.31 try 2] ehca: Tolerate dynamic memory operations and huge pages

2009-06-23 Thread Roland Dreier
applied...
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [ewg] Re: [PATCH 2.6.31 try 2] ehca: Tolerate dynamic memory operations and huge pages

2009-06-22 Thread Roland Dreier
thanks, applied.

  -#define HCAD_VERSION 0026
  +#define HCAD_VERSION 0027

the driver version was already 0027 (since bde2cfaf), so I dropped this chunk.
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [ewg] Re: [PATCH 2.6.31] ehca: Tolerate dynamic memory operations and huge pages

2009-06-16 Thread Roland Dreier

  Yeah, the notifier code remains untouched as we still do not allow dynamic
  memory operations _while_ our module is loaded. The patch allows the driver 
  to
  cope with DMEM operations that happened before the module was loaded, which
  might result in a non-contiguous memory layout. When the driver registers
  its global memory region in the system, the memory layout must be considered.
  
  We chose the term toleration instead of support to illustrate this.

I see.  So things just silently broke in some cases when the driver was
loaded after operations you didn't tolerate?

Anyway, thanks for the explanation.
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH 1/2] lib: Provide generic atomic64_t implementation

2009-06-14 Thread Roland Dreier

  The new Nehalems provide 8 logical threads in a single socket.  All
  those threads share a cache, and they have cmpxchg8b anyway, so this
  won't matter.

FWIW, Nehalem EX actually goes to 8 cores/16 threads per socket.  But
worrying about 32-bit performance on Nehalem is a little silly -- this
simplest solution is simply to run a 64-bit kernel.

 - R.
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH 2.6.31] ehca: Tolerate dynamic memory operations and huge pages

2009-06-12 Thread Roland Dreier
OK, one major issue with this patch and a few minor nits.

First, the major issue is that I don't see anything in the patch that
changes the code in ehca_mem_notifier() in ehca_main.c:

case MEM_GOING_ONLINE:
case MEM_GOING_OFFLINE:
/* only ok if no hca is attached to the lpar */
spin_lock_irqsave(shca_list_lock, flags);
if (list_empty(shca_list)) {
spin_unlock_irqrestore(shca_list_lock, flags);
return NOTIFY_OK;
} else {
spin_unlock_irqrestore(shca_list_lock, flags);
if (printk_timed_ratelimit(ehca_dmem_warn_time,
   30 * 1000))
ehca_gen_err(DMEM operations are not allowed
 in conjunction with eHCA);
return NOTIFY_BAD;
}

But your patch description says:

  This patch implements toleration of dynamic memory operations

But it seems you're still going to hit the same NOTIFY_BAD case above
after your patch.  So something doesn't compute for me.  Could you
explain more?

Second, a nit:

  +#define EHCA_REG_MR 0
  +#define EHCA_REG_BUSMAP_MR (~0)

and you pass these as the reg_busmap parm in:

   int ehca_reg_mr(struct ehca_shca *shca,
   struct ehca_mr *e_mr,
   u64 *iova_start,
  @@ -991,7 +1031,8 @@
   struct ehca_pd *e_pd,
   struct ehca_mr_pginfo *pginfo,
   u32 *lkey, /*OUT*/
  -u32 *rkey) /*OUT*/
  +u32 *rkey, /*OUT*/
  +int reg_busmap)

and test it as:

  +if (reg_busmap)
  +ret = ehca_reg_bmap_mr_rpages(shca, e_mr, pginfo);
  +else
  +ret = ehca_reg_mr_rpages(shca, e_mr, pginfo);

So the ~0 for true looks a bit odd.  One option would be to make
reg_busmap a bool, since that's how you're using it, but then you lose
the nice self-documenting macro where you call things.

So I think it would be cleaner to do something like

enum ehca_reg_type {
EHCA_REG_MR,
EHCA_REG_BUSMAP_MR
};

and make the int reg_busmap parameter into enum ehca_reg_type reg_type
and have the code become

+   if (reg_type == EHCA_REG_BUSMAP_MR)
+   ret = ehca_reg_bmap_mr_rpages(shca, e_mr, pginfo);
+   else if (reg_type == EHCA_REG_MR)
+   ret = ehca_reg_mr_rpages(shca, e_mr, pginfo);
+   else
+   ret = -EINVAL

or something like that.

  +struct ib_dma_mapping_ops ehca_dma_mapping_ops = {
  +.mapping_error = ehca_dma_mapping_error,
  +.map_single = ehca_dma_map_single,
  +.unmap_single = ehca_dma_unmap_single,
  +.map_page = ehca_dma_map_page,
  +.unmap_page = ehca_dma_unmap_page,
  +.map_sg = ehca_dma_map_sg,
  +.unmap_sg = ehca_dma_unmap_sg,
  +.dma_address = ehca_dma_address,
  +.dma_len = ehca_dma_len,
  +.sync_single_for_cpu = ehca_dma_sync_single_for_cpu,
  +.sync_single_for_device = ehca_dma_sync_single_for_device,
  +.alloc_coherent = ehca_dma_alloc_coherent,
  +.free_coherent = ehca_dma_free_coherent,
  +};

I always think structures like this are easier to read if you align the
'=' signs.  But no big deal.

  +ret = ehca_create_busmap();
  +if (ret) {
  +ehca_gen_err(Cannot create busmap.);
  +goto module_init2;
  +}
  +
   ret = ibmebus_register_driver(ehca_driver);
   if (ret) {
   ehca_gen_err(Cannot register eHCA device driver);
   ret = -EINVAL;
  -goto module_init2;
  +goto module_init3;
   }
   
   ret = register_memory_notifier(ehca_mem_nb);
   if (ret) {
   ehca_gen_err(Failed registering memory add/remove notifier);
  -goto module_init3;
  +goto module_init4;

Having to renumber unrelated things is when something changes is why I
don't like this style of error path labels.  But I think it's well and
truly too late to fix that in ehca.

 - R.
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH] IB/ehca: Remove superfluous bitmasks from QP control block

2009-06-03 Thread Roland Dreier
looks fine, applied for 2.6.31
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: [ewg] Re: [PATCH 1/3] IB/ehca: Replace vmalloc with kmalloc

2009-04-28 Thread Roland Dreier
  did you have a chance to take a look at the patchset and will you apply it, 
  or
  are there any outstanding issues we need to address?

I guess it's OK, but definitely 2.6.31 material.  I guess I'll stick it
linux-next soon.

 - R.
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: [PATCH 1/3] IB/ehca: Replace vmalloc with kmalloc

2009-04-28 Thread Roland Dreier
thanks, applied.

  From: Anton Blanchard antonb at au1.ibm.com
  Signed-off-by: Stefan Roscher stefan.roscher at de.ibm.com

please use '@' signs so these are real email addresses.

 - R.
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: [PATCH 3/3] IB/ehca: Increment version number

2009-04-28 Thread Roland Dreier
thanks, applied 2  3.
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: [PATCH 0/3] IB/ehca: Perfomance improvment for creation of queue pairs

2009-04-21 Thread Roland Dreier
  Because of this fundamental code change we will also increment the version 
  number.

So this is 2.6.31-targeted stuff I assume?  (Too late in the 2.6.30
cycle for fundamental code change of course)

 - R.
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: [PATCH 1/3] IB/ehca: Replace vmalloc with kmalloc

2009-04-21 Thread Roland Dreier
  +queue-queue_pages = kmalloc(nr_of_pages * sizeof(void *), GFP_KERNEL);

How big might this buffer be?  Any chance of allocation failure due to
memory fragmentation?

 - R.
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: [PATCH v5] introduce macro spin_event_timeout()

2009-03-10 Thread Roland Dreier
  Alan did have one valid point though.  Determining how long to loop
  for is architecture-specific.  Using jiffies is bad, because even one
  jiffy is too long.  Adding a udelay() inside the loop means that it
  only checks he condition every microsecond.  So the real solution is
  to use keep looping until a certain amount of time has passed.  This
  means using an architecture-specific timebase register.

Are there really cases where spinning for 1 jiffy is too long of a
timeout?  It might make sense for the parameter passed in to be in terms
of microseconds but I have a hard time coming up with a case where
having the real timeout be 40 msecs or whatever 1 jiffy ends up being is
a real problem -- after all, this helper is intended for the case where
we expect the condition to become true much sooner than the worst case.

 - R.
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: FW: [PATCH] powerpc/mm: Export HPAGE_SHIFT

2009-02-04 Thread Roland Dreier
 huge_page_size(page_hstate(page))

   That would suit.  I assume the intention is for that to be usable by
   driver modules on any architecture?

  erm, you overestimate the amount of planning and forethought which goes
  into these things ;)

  The lack of any EXPORT_SYMBOL(size_to_hstate) is a broadish hint.

Heh.  Looking into the implementation, it seems that I could actually do

PAGE_SIZE  compound_order(page)

directly (since there's no reason to go from size to hstate and back to
size.  I don't know all the details of these VM internals, but that
seems to only work on the first (small) page of a giant page?  Which
causes problems for what we're trying to do here...

To summarize the goal, we are mapping user memory to a device that has
its own page tables, where the device's page tables can also use
multiple page sizes.  Using big pages on the device leads to similar
efficiencies as hugetlb pages do on the CPU, and in fact if a user has
used hugetlb pages for the memory they're giving to the device, that's a
very strong hint that the device should use big pages too.

But one valid situation we have to handle in the driver is if, say,
userspace has a hugetlb mapped at virtual address 0x20, and wants to
map 0x8 bytes at 0x28 to the device.  In that case, we're going
to do essentially

get_user_pages(..., 0x28, 0x8 / PAGE_SIZE, ...)

and get_user_pages() is going to give us a bunch of normal PAGE_SIZE
pages starting at offset 0x80 within the compound page that makes up
the huge page mapped at 0x20.

get_user_pages() also gives us the vma back, and we can see from
is_vm_hugetlb_page() (-- BTW can I just say that a function
is_xxx_page() that operates on vmas is horribly misnamed --) that these
pages all come from a hugetlb mapping, but figuring out the size of that
mapping is I guess a challenge.

 - R.
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: FW: [PATCH] powerpc/mm: Export HPAGE_SHIFT

2009-02-04 Thread Roland Dreier
  Right, but then you need to set that in the VMA's, and thus gone is your
  nice fast g_u_p() that doesn't touch VMAs :-)

Registering memory is a slow path thing in the RDMA world.  Speeding it
up is nice, so we make userspace do the madvise(VM_DONTCOPY) if it cares
but if it doesn't it can leave it out.

   Yes, but unfortunately MPI says apps can allocate memory however they
   damn well please... in any case these issues are all-too-well-known in
   the RDMA world for quite a while.

  Yup. What do you think of the idea of pre-COWing pages with an elevated
  count at fork time ?

Super-duper sucks if the first thing the child does is exec() :)
Also if the parent has registered  half the memory in the system then
it's instant OOM.  So not that useful for the RDMA case :)

The one thing that might make sense is to pre-COW any partial pages that
the parent has registered -- ie if half a page can be used by the child,
at least pre-COW that, but leave all the full pages with VM_DONTCOPY.

 - R.
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


FW: [PATCH] powerpc/mm: Export HPAGE_SHIFT

2009-02-03 Thread Roland Dreier
Forwarding Eli's patch below, since PowerPC guys may have missed it.  I
guess the question for Ben et al is whether there is any issue with
exporting HPAGE_SHIFT for modules (can be EXPORT_SYMBOL_GPL if you feel
it's an internal detail).  It would probably make sense to roll this
change into the mlx4 change that Eli alludes to below and merge through
my tree (with ppc maintainer acks of course), rather than splitting this
patch out and introducing cross-tree dependencies (and also separating
the rationale for the change from the change itself).

Thanks,
  Roland


Drivers may want to take advantage of the large pages used for memory obtained
from hugetlbfs. One example is mlx4_ib which can use much less MTT entries (in
the order of HPAGE_SIZE / PAGE_SIZE) when registering such memory, thus scale
significantly better when registering larger memory regions. Other drivers
could also benefit from this.

Signed-off-by: Eli Cohen e...@mellanox.co.il
---
 arch/powerpc/mm/hash_utils_64.c |1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/arch/powerpc/mm/hash_utils_64.c b/arch/powerpc/mm/hash_utils_64.c
index 8d5b475..6cff8c7 100644
--- a/arch/powerpc/mm/hash_utils_64.c
+++ b/arch/powerpc/mm/hash_utils_64.c
@@ -104,6 +104,7 @@ int mmu_highuser_ssize = MMU_SEGSIZE_256M;
 u16 mmu_slb_size = 64;
 #ifdef CONFIG_HUGETLB_PAGE
 unsigned int HPAGE_SHIFT;
+EXPORT_SYMBOL(HPAGE_SHIFT);
 #endif
 #ifdef CONFIG_PPC_64K_PAGES
 int mmu_ci_restrictions;
-- 
1.6.1


___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: [PATCH] infiniband/ehca: use consistent type

2009-01-16 Thread Roland Dreier
I'm going to apply this for 2.6.29, since the change to the u64 type
went upstream, so ehca spews a bunch of warnings now.

I'll also add the following patch to fix all the printk format warnings:

IB/ehca: Fix printk format warnings from u64 type change

Commit fe21 (powerpc: Change u64/s64 to a long long integer
type) changed u64 from unsigned long to unsigned long long, which
means that printk formats for printing u64 values should use ll
instead of l to avoid warnings.  Fix all the places affected by this
in ehca.

Signed-off-by: Roland Dreier rola...@cisco.com
---
 drivers/infiniband/hw/ehca/ehca_cq.c |   16 ++--
 drivers/infiniband/hw/ehca/ehca_hca.c|2 +-
 drivers/infiniband/hw/ehca/ehca_irq.c|   18 ++--
 drivers/infiniband/hw/ehca/ehca_main.c   |6 +-
 drivers/infiniband/hw/ehca/ehca_mcast.c  |4 +-
 drivers/infiniband/hw/ehca/ehca_mrmw.c   |  144 +++---
 drivers/infiniband/hw/ehca/ehca_qp.c |   32 
 drivers/infiniband/hw/ehca/ehca_reqs.c   |2 +-
 drivers/infiniband/hw/ehca/ehca_sqp.c|2 +-
 drivers/infiniband/hw/ehca/ehca_tools.h  |2 +-
 drivers/infiniband/hw/ehca/ehca_uverbs.c |2 +-
 drivers/infiniband/hw/ehca/hcp_if.c  |   30 +++---
 12 files changed, 130 insertions(+), 130 deletions(-)

diff --git a/drivers/infiniband/hw/ehca/ehca_cq.c 
b/drivers/infiniband/hw/ehca/ehca_cq.c
index 2f4c28a..97e4b23 100644
--- a/drivers/infiniband/hw/ehca/ehca_cq.c
+++ b/drivers/infiniband/hw/ehca/ehca_cq.c
@@ -196,7 +196,7 @@ struct ib_cq *ehca_create_cq(struct ib_device *device, int 
cqe, int comp_vector,
 
if (h_ret != H_SUCCESS) {
ehca_err(device, hipz_h_alloc_resource_cq() failed 
-h_ret=%li device=%p, h_ret, device);
+h_ret=%lli device=%p, h_ret, device);
cq = ERR_PTR(ehca2ib_return_code(h_ret));
goto create_cq_exit2;
}
@@ -232,7 +232,7 @@ struct ib_cq *ehca_create_cq(struct ib_device *device, int 
cqe, int comp_vector,
 
if (h_ret  H_SUCCESS) {
ehca_err(device, hipz_h_register_rpage_cq() failed 
-ehca_cq=%p cq_num=%x h_ret=%li counter=%i 
+ehca_cq=%p cq_num=%x h_ret=%lli counter=%i 
 act_pages=%i, my_cq, my_cq-cq_number,
 h_ret, counter, param.act_pages);
cq = ERR_PTR(-EINVAL);
@@ -244,7 +244,7 @@ struct ib_cq *ehca_create_cq(struct ib_device *device, int 
cqe, int comp_vector,
if ((h_ret != H_SUCCESS) || vpage) {
ehca_err(device, Registration of pages not 
 complete ehca_cq=%p cq_num=%x 
-h_ret=%li, my_cq, my_cq-cq_number,
+h_ret=%lli, my_cq, my_cq-cq_number,
 h_ret);
cq = ERR_PTR(-EAGAIN);
goto create_cq_exit4;
@@ -252,7 +252,7 @@ struct ib_cq *ehca_create_cq(struct ib_device *device, int 
cqe, int comp_vector,
} else {
if (h_ret != H_PAGE_REGISTERED) {
ehca_err(device, Registration of page failed 
-ehca_cq=%p cq_num=%x h_ret=%li 
+ehca_cq=%p cq_num=%x h_ret=%lli 
 counter=%i act_pages=%i,
 my_cq, my_cq-cq_number,
 h_ret, counter, param.act_pages);
@@ -266,7 +266,7 @@ struct ib_cq *ehca_create_cq(struct ib_device *device, int 
cqe, int comp_vector,
 
gal = my_cq-galpas.kernel;
cqx_fec = hipz_galpa_load(gal, CQTEMM_OFFSET(cqx_fec));
-   ehca_dbg(device, ehca_cq=%p cq_num=%x CQX_FEC=%lx,
+   ehca_dbg(device, ehca_cq=%p cq_num=%x CQX_FEC=%llx,
 my_cq, my_cq-cq_number, cqx_fec);
 
my_cq-ib_cq.cqe = my_cq-nr_of_entries =
@@ -307,7 +307,7 @@ create_cq_exit3:
h_ret = hipz_h_destroy_cq(adapter_handle, my_cq, 1);
if (h_ret != H_SUCCESS)
ehca_err(device, hipz_h_destroy_cq() failed ehca_cq=%p 
-cq_num=%x h_ret=%li, my_cq, my_cq-cq_number, h_ret);
+cq_num=%x h_ret=%lli, my_cq, my_cq-cq_number, 
h_ret);
 
 create_cq_exit2:
write_lock_irqsave(ehca_cq_idr_lock, flags);
@@ -355,7 +355,7 @@ int ehca_destroy_cq(struct ib_cq *cq)
h_ret = hipz_h_destroy_cq(adapter_handle, my_cq, 0);
if (h_ret == H_R_STATE) {
/* cq in err: read err data and destroy it forcibly */
-   ehca_dbg(device, ehca_cq=%p cq_num=%x ressource=%lx in err 
+   ehca_dbg(device, ehca_cq=%p cq_num=%x resource=%llx in err

Re: [PATCH] infiniband/ehca: use consistent type

2009-01-16 Thread Roland Dreier
  Sorry to appear picky, but how is that different from the patch powerpc:
  cleanup from powerpc l64 to ll64 change: drivers/infiniband that I sent
  to you on Jan 7?

Well, I didn't lose my version and forget all about it ;)

Seriously, I must have lost that patch, sorry about that.  I'll dig it
out and replace mine so you get credit.

 - R.
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: mal_probe crash

2009-01-09 Thread Roland Dreier
  Can you double check that the e1000 isn't copying the PCI resources into
  a unsigned long before ioremap'ing the result, thus cropping the top
  bits ?

as far as I can see, e1000 is using pci_ioremap_bar(), which should do
the right thing as long as resource_size_t is the right type (which it
looks like it is on PowerPC 44x).

 - R.
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: [PATCH] infiniband/ehca: spin_lock_irqsave takes an unsigned long

2009-01-07 Thread Roland Dreier
thanks, applied.
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: [PATCH] infiniband/ehca: use consistent type

2009-01-07 Thread Roland Dreier
If we're going to clean this code up, does it make sense to take it
further?  More precisely, your patch does:

@@ -226,7 +226,7 @@ u64 hipz_h_alloc_resource_eq(const struct 
ipz_adapter_handle adapter_handle,
 u32 *eq_ist)
 {
u64 ret;
-   u64 outs[PLPAR_HCALL9_BUFSIZE];
+   unsigned long outs[PLPAR_HCALL9_BUFSIZE];
u64 allocate_controls;

but every parameter of ehca_plpar_hcall9() is unsigned long, and the
return value is a signed long.  So should we change ret to long and all
these other declarations to unsigned long while we're touching the code
here?

 - R.
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: [PATCH] infiniband/ehca: spin_lock_irqsave takes an unsigned long

2008-12-30 Thread Roland Dreier
are you trying to land the 'typedef unsigned long u64' change for
2.6.28, or can these patches wait for 2.6.29?

 - R.
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: [PATCH] IB/ehca: replace modulus operations in flush error completion path

2008-12-05 Thread Roland Dreier
thanks, applied
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: [PATCH] IB/ehca: Fix locking for shca_list_lock

2008-11-21 Thread Roland Dreier
Looks good... I'll add this for 2.6.29, since as far as I can tell this
bug has been there approximately forever already.
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: [PATCH] IB/ehca: Fix suppression of port activation events

2008-11-10 Thread Roland Dreier
  A previous fix introduced a regression where port activation events were
  dropped unconditionally if port autodetection was not enabled. Fixed.

Is this a fix to IB/ehca: Remove reference to special QP in case of
port activation failure?  Because if so I can roll it into that patch,
since Linus hasn't pulled it yet.

 - R.
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: [PATCH] IB/ehca: remove reference to the QP in case of port activation failure

2008-11-04 Thread Roland Dreier
thanks, applied
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: [PATCH] IB/ehca: Fix problem with max number of QPs and CQs in systems with different adapters

2008-10-22 Thread Roland Dreier
thanks, applied
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: [PATCH]IB/ehca:reject dynamic memory add/remove

2008-10-22 Thread Roland Dreier
thanks, applied with a slightly expanded changelog.
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: [PATCH 1/1] IB/ehca: Disallow creating UC QP with SRQ

2008-10-10 Thread Roland Dreier
thanks, applied -- it didn't apply to the latest tree, because of the
flush CQE changes, so I merged it manually as below -- let me know if
this is wrong:

commit 0540bbbe455e123a1692d26205ad1a29983883b0
Author: Hoang-Nam Nguyen [EMAIL PROTECTED]
Date:   Fri Oct 10 14:40:39 2008 -0700

IB/ehca: Don't allow creating UC QP with SRQ

This patch prevents a UC QP to be created attached to an SRQ, since
current firmware does not support this feature.

Signed-off-by: Michael Faath [EMAIL PROTECTED]
Signed-off-by: Roland Dreier [EMAIL PROTECTED]

diff --git a/drivers/infiniband/hw/ehca/ehca_qp.c 
b/drivers/infiniband/hw/ehca/ehca_qp.c
index 4dbe287..40b578d 100644
--- a/drivers/infiniband/hw/ehca/ehca_qp.c
+++ b/drivers/infiniband/hw/ehca/ehca_qp.c
@@ -502,6 +502,12 @@ static struct ehca_qp *internal_create_qp(
if (init_attr-srq) {
my_srq = container_of(init_attr-srq, struct ehca_qp, ib_srq);
 
+   if (qp_type == IB_QPT_UC) {
+   ehca_err(pd-device, UC with SRQ not supported);
+   atomic_dec(shca-num_qps);
+   return ERR_PTR(-EINVAL);
+   }
+
has_srq = 1;
parms.ext_type = EQPT_SRQBASE;
parms.srq_qpn = my_srq-real_qp_num;
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: [PATCH] ib/ehca: add flush CQE generation

2008-09-16 Thread Roland Dreier
thanks, queued for 2.6.28.
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


[PATCH] powerpc: Avoid integer overflow in page_is_ram()

2008-09-15 Thread Roland Dreier
Commit 8b150478 (ppc: make phys_mem_access_prot() work with pfns
instead of addresses) fixed page_is_ram() in arch/ppc to avoid overflow
for addresses above 4G on 32-bit kernels.  However arch/powerpc's
page_is_ram() is missing the same fix -- it computes a physical address
by doing pfn  PAGE_SHIFT, which overflows if pfn corresponds to a page
above 4G.

In particular this causes pages above 4G to be mapped with the wrong
caching attribute; for example many ppc440-based SoCs have PCI space
above 4G, and mmap()ing MMIO space may end up with a mapping that has
caching enabled.

Fix this by working with the pfn and avoiding the conversion to
physical address that causes the overflow.  This patch compares the
pfn to max_pfn, which is a semantic change from the old code -- that
code compared the physical address to high_memory, which corresponds
to max_low_pfn.  However, I think that was is another bug, since
highmem pages are still RAM.

Reported-by: vb [EMAIL PROTECTED]
Signed-off-by: Roland Dreier [EMAIL PROTECTED]
Acked-by: Benjamin Herrenschmidt [EMAIL PROTECTED]

---
Paul, didn't see this in your list... please add for 2.6.28.

 arch/powerpc/mm/mem.c |5 ++---
 1 files changed, 2 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/mm/mem.c b/arch/powerpc/mm/mem.c
index 1c93c25..98d7bf9 100644
--- a/arch/powerpc/mm/mem.c
+++ b/arch/powerpc/mm/mem.c
@@ -75,11 +75,10 @@ static inline pte_t *virt_to_kpte(unsigned long vaddr)
 
 int page_is_ram(unsigned long pfn)
 {
-   unsigned long paddr = (pfn  PAGE_SHIFT);
-
 #ifndef CONFIG_PPC64   /* XXX for now */
-   return paddr  __pa(high_memory);
+   return pfn  max_pfn;
 #else
+   unsigned long paddr = (pfn  PAGE_SHIFT);
int i;
for (i=0; i  lmb.memory.cnt; i++) {
unsigned long base;
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: [PATCH] powerpc: Avoid integer overflow in page_is_ram()

2008-09-15 Thread Roland Dreier
   #ifndef CONFIG_PPC64   /* XXX for now */
   -  return paddr  __pa(high_memory);
   +  return pfn  max_pfn;
   #else
   +  unsigned long paddr = (pfn  PAGE_SHIFT);
  
  seems like this could be a phys_addr_t

Yes, it could I guess, but that would be an unrelated change, and I'm
not sure there's much point given this is in 64-bit-only code.

 - R.
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: demuxing irqs

2008-09-13 Thread Roland Dreier
  Can someone point me at a simple example of how to demux irqs using
  the powerpc irq functions? I have eight devices on a single irq and I
  want to turn them into virtual irqs.

Sorry about the previous reply.

Anyway, what are you going to demux based on?  Do you have some other
signal you can read in the interrupt dispatch code that tells you which
device raised the interrupt?  What happens if two devices raise an
interrupt at the same time?

If you just have 8 interrupt lines wire-ORed together then you probably
just need to register your interrupt handlers with IRQF_SHARED and run 8
interrupt handlers on an interrupt.

 - R.
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: demuxing irqs

2008-09-13 Thread Roland Dreier
  The muxed interrupts are inside a SOC CPU.  For example eight GPIOs
  can each individually be enabled to trigger hardware interrupt 7. When
  I get hw interrupt 7 i want to demux it into 8 virtual interrupts.
  There are eight bit registers for individually acking, enabling, etc
  each of the eight multiplexed interrupts. With eight virutal
  interrupts each user can register a different handler and isn't aware
  the muxing is going on.

I see... well, assuming all the issues around handling multiple
simultaneous (or overlapping) interrupts are OK, you could look at how
arch/powerpc/sysdev/uic.c handles the cascaded interrupt controllers for
4xx SoCs.  The idea of that code is that you get an interrupt, and look
look at interrupt cause register 0, and see interrupt X, and to handle
interrupt X you read interrupt cause register 1 to see which the real
interrupt is.  And you might see interrupt Y, which means to go onto
interrupt cause register 2.

 - R.
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: [PATCH 1/9] powerpc/44x: Add PowerPC 44x simple platform support

2008-08-20 Thread Roland Dreier
  You should fix 52xx with the same for loop change then, since I
  blatantly most of this file from you ;)

Heh, then factor out mpc5200_simple_probe() into a helper and use it
instead of copying it as ppc44x_probe?  ;)

If you stick to the NULL-terminated array version, then it becomes easy
to convert some other platform probing code to use your new code too, eg
tqm85xx_probe() could use an array too.

 - R.
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: [PATCH 1/9] powerpc/44x: Add PowerPC 44x simple platform support

2008-08-20 Thread Roland Dreier
  Except that logically doesn't make much sense.  Why would you have a
  list of mpc52xx and 44x boards together?  They require completely
  different kernels because the MMU and drive set is entirely different.
  
  Or am I totally missing what you are saying?

Yeah, I wasn't clear -- I meant to add a new helper like
of_flat_dt_is_compatible_list() (not sure of the name) that takes a node
and a NULL-terminated array of strings, and then mpc5200_simple_probe()
can become a one-liner, along with mpc5121_generic_probe(),
tqm85xx_probe(), ppc44x_probe(), etc.

 - R.
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: [PATCH 1/9] powerpc/44x: Add PowerPC 44x simple platform support

2008-08-20 Thread Roland Dreier
   Yeah, I wasn't clear -- I meant to add a new helper like
   of_flat_dt_is_compatible_list() (not sure of the name) that takes a node
   and a NULL-terminated array of strings, and then mpc5200_simple_probe()
   can become a one-liner, along with mpc5121_generic_probe(),
   tqm85xx_probe(), ppc44x_probe(), etc.

  I worry about doing that though.  I don't want to give people the
  impression that these boards are of compatible.  I want it to be more
  this file supports these explicit platforms and makes no claim on
  compatibility between them.

It's no big deal really -- just my knee-jerk reaction to seeing a
comment like this should also be fixed in some other place because I
copied the code from there.  My reaction to that is always, why not
use one common copy of the code then?

You could call it something like of_flat_dt_match_list() if that makes
more sense to you.  Or just copy the for loop around, it's not much code.

 - R.
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: [PATCH 5/5 try2] ib/ehca: discard double CQE for one WR

2008-08-12 Thread Roland Dreier
thanks, applied all 5.
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: [PATCH] isdn: mISDN HFC PCI support depends on virt_to_bus()

2008-07-27 Thread Roland Dreier
  IMHO, this driver was really rushed in and that was a huge mistake.
  If it had gone through linux-next we could have tidied all of this
  stuff up in a less rushed manner.

??  This thread *is* about a build failure in linux-next.

 - R.
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: [PATCH 2/2] IB/ehca: Default value for Local CA ACK Delay

2008-07-21 Thread Roland Dreier
thanks, applied 1 and 2
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: [patch 9/9] powerpc/cell: Add DMA_ATTR_STRONG_ORDERING dma attribute and use in IOMMU code

2008-07-15 Thread Roland Dreier
Sorry for the late comments, I missed this when it went by before.

  +DMA_ATTR_STRONG_ORDERING
  +--
  +
  +DMA_ATTR_STRONG_ORDERING specifies that previous reads and writes are
  +performed in the order in which they're received by the IOMMU; thus
  +reads and writes may not pass each other.

I don't understand what this is trying to say.  What is previous
referring to?  What does received by the IOMMU mean -- do you mean
issued onto the bus by the CPU? When you say reads and writes may not
pass each other, do you mean just that reads may not pass writes and
writes may not pass reads, or do you mean that reads also can't pass
reads and writes can't pass writes?

Since I don't know exactly what this attribute does, I can't be sure,
but it seems that making weak ordering the default is dangerous in that
it breaks drivers that expect usual memory ordering semantics.  Would it
be safer/better to make strong ordering the default and then add a
WEAK_ORDERING attribute that drivers can use as an optimization?

 - R.
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: [patch 9/9] powerpc/cell: Add DMA_ATTR_STRONG_ORDERING dma attribute and use in IOMMU code

2008-07-15 Thread Roland Dreier
  This is all about inbound transfers, i.e. DMAs coming from the I/O bridge
  into the CPU, both DMA read and DMA write.

OK -- at a minimum I think the documentation should make that clear.  As
it stands the description of what this does is pretty much impossible to
parse and understand.

  Strong ordering is only active when both the bridge and the IOMMU enable
  it, but for correctly written drivers, this only results in a slowdown.

So when would someone use this dma attribute?  As a hack to fix drivers
where the real fix is too complicated?

 - R.
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: [PATCH] IB/ehca: Make device table externally visible

2008-07-01 Thread Roland Dreier
thanks, applied
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: [PATCH REPOST #2] IB/ehca: In case of lost interrupts, trigger EOI to reenable interrupts

2008-06-23 Thread Roland Dreier
ok, I queued this for 2.6.27.  thanks everyone
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: [PATCH] IB/ehca: Reject recv WRs if QP is in RESET state

2008-06-20 Thread Roland Dreier
thanks, applied.
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: [PATCH REPOST #2] IB/ehca: In case of lost interrupts, trigger EOI to reenable interrupts

2008-06-19 Thread Roland Dreier
  During corner case testing, we noticed that some versions of ehca 
  do not properly transition to interrupt done in special load situations.
  This can be resolved by periodically triggering EOI through H_EOI, 
  if eqes are pending.
  
  Signed-off-by: Stefan Roscher [EMAIL PROTECTED]
  ---
  As firmware team suggested I moved the call of the EOI h_call into 
  the handler function, this ensures that we will call EOI only when we 
  find a valid eqe on the event queue.
  Additionally I changed the calculation of the xirr value as Roland suggested.

paulus / benh -- does this version still get your ack?  Seems that fw
team is OK with it according to Stefan...

If so I will add this to my tree for 2.6.27.

  diff --git a/drivers/infiniband/hw/ehca/ehca_irq.c 
  b/drivers/infiniband/hw/ehca/ehca_irq.c
  index ce1ab05..0792d93 100644
  --- a/drivers/infiniband/hw/ehca/ehca_irq.c
  +++ b/drivers/infiniband/hw/ehca/ehca_irq.c
  @@ -531,7 +531,7 @@ void ehca_process_eq(struct ehca_shca *shca, int is_irq)
   {
   struct ehca_eq *eq = shca-eq;
   struct ehca_eqe_cache_entry *eqe_cache = eq-eqe_cache;
  -u64 eqe_value;
  +u64 eqe_value, ret;
   unsigned long flags;
   int eqe_cnt, i;
   int eq_empty = 0;
  @@ -583,8 +583,13 @@ void ehca_process_eq(struct ehca_shca *shca, int is_irq)
   ehca_dbg(shca-ib_device,
No eqe found for irq event);
   goto unlock_irq_spinlock;
  -} else if (!is_irq)
  +} else if (!is_irq) {
  +ret = hipz_h_eoi(eq-ist);
  +if (ret != H_SUCCESS)
  +ehca_err(shca-ib_device,
  + bad return code EOI -rc = %ld\n, ret);
   ehca_dbg(shca-ib_device, deadman found %x eqe, eqe_cnt);
  +}
   if (unlikely(eqe_cnt == EHCA_EQE_CACHE_SIZE))
   ehca_dbg(shca-ib_device, too many eqes for one irq event);
   /* enable irq for new packets */
  diff --git a/drivers/infiniband/hw/ehca/hcp_if.c 
  b/drivers/infiniband/hw/ehca/hcp_if.c
  index 5245e13..415d3a4 100644
  --- a/drivers/infiniband/hw/ehca/hcp_if.c
  +++ b/drivers/infiniband/hw/ehca/hcp_if.c
  @@ -933,3 +933,13 @@ u64 hipz_h_error_data(const struct ipz_adapter_handle 
  adapter_handle,
  r_cb,
  0, 0, 0, 0);
   }
  +
  +u64 hipz_h_eoi(int irq)
  +{
  +unsigned long xirr;
  +
  +iosync();
  +xirr = (0xffULL  24) | irq;
  +
  +return plpar_hcall_norets(H_EOI, xirr);
  +}
  diff --git a/drivers/infiniband/hw/ehca/hcp_if.h 
  b/drivers/infiniband/hw/ehca/hcp_if.h
  index 60ce02b..2c3c6e0 100644
  --- a/drivers/infiniband/hw/ehca/hcp_if.h
  +++ b/drivers/infiniband/hw/ehca/hcp_if.h
  @@ -260,5 +260,6 @@ u64 hipz_h_error_data(const struct ipz_adapter_handle 
  adapter_handle,
 const u64 ressource_handle,
 void *rblock,
 unsigned long *byte_count);
  +u64 hipz_h_eoi(int irq);
   
   #endif /* __HCP_IF_H__ */
  -- 
  1.5.5
  
  
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: [PATCH REPOST] IB/ehca: In case of lost interrupts, trigger EOI to reenable interrupts

2008-06-10 Thread Roland Dreier
  During corner case testing, we noticed that some versions of ehca 
  do not properly transition to interrupt done in special load situations.
  This can be resolved by periodically triggering EOI through H_EOI, 
  if eqes are pending.

So just to be clear: this is a workaround for a hardware/firmware bug?

 - R.
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: [PATCH REPOST] IB/ehca: In case of lost interrupts, trigger EOI to reenable interrupts

2008-06-10 Thread Roland Dreier
   So just to be clear: this is a workaround for a hardware/firmware bug?

  Yes it is.

OK, so paulus et al... does it seem like a good approach to call H_EOI
from driver code (given that this driver makes tons of other hcalls)?

How critical is this?  Since you said corner case testing I suspect we
can defer this to 2.6.27 and maybe get it into -stable later?

Also, out of curiousity:

  +u64 hipz_h_eoi(int irq)
  +{
  +int value;
  +unsigned long xirr;
  +
  +iosync();

what is the iosync() required for here?

  +value = (0xff  24) | irq;
  +xirr = value  0x;

given that irq and value are ints, is there any possible way value could
have bits outside of the low 32 set?  If you're worried about sign
extension isn't it simpler to just make value unsigned?

  +return plpar_hcall_norets(H_EOI, xirr);
  +}

ie why not:

u64 hipz_h_eoi(int irq)
{
unsigned xirr;

iosync();
xirr = (0xff  24) | irq;
return plpar_hcall_norets(H_EOI, xirr);
}
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: MMIO and gcc re-ordering issue

2008-06-10 Thread Roland Dreier
  me too.  That's the whole basis for readX_relaxed() and its cohorts: we
  make our weirdest machines (like altix) conform to the x86 norm.  Then
  where it really kills us we introduce additional semantics to selected
  drivers that enable us to recover I/O speed on the abnormal platforms.

Except as I pointed out before, Altix doesn't conform to the norm and
many (most?) drivers are missing mmiowb()s that are needed for Altix.
Just no one has plugged most devices into an Altix (or haven't stressed
the driver in a way that exposes problems of IO ordering between CPUs).

It would be a great thing to use the powerpc trick of setting a flag
that is tested by spin_unlock()/mutex_unlock() and automatically doing
the mmiowb() if needed, and then killing off mmiowb() entirely.

 - R.
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: [PATCH 2/2] IB/ehca: In case of lost interrupts, trigger EOI to reenable interrupts

2008-06-09 Thread Roland Dreier
  +if (desc-chip  desc-chip-eoi)
  +desc-chip-eoi(irq);

This doesn't seem like the type of thing a device driver should be
doing.  Both patches are rather short on changelog text -- what is the
issue here and how does this fix it?

 - R.
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: IB/ehca: Reject send WRs only for RESET, INIT and RTR state

2008-06-06 Thread Roland Dreier
thanks, applied.
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: MMIO and gcc re-ordering issue

2008-05-29 Thread Roland Dreier
  The problem is that your two writel's, despite being both issued on
  cpu X, due to the spin lock, in your example, can end up with the
  first one going through NR 1 and the second one going through NR 2. If
  there's contention on NR 1, the write going via NR 2 may hit the PCI
  bridge prior to the one going via NR 1.

Really??  I can't see how you can expect any drivers to work reliably if
simple code like

writel(reg1);
writel(reg2);

might end up with the write to reg2 hitting the device before the write
to reg1.  Almost all MMIO stuff has ordering requirements and will
totally break if you allow, say, the start IO write to be reordered
before the set IO address write.

 - R.
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: MMIO and gcc re-ordering issue

2008-05-27 Thread Roland Dreier
  Actually, this specifically should not be.  The need for mmiowb on altix
  is because it explicitly violates some of the PCI rules that would
  otherwise impede performance.   The compromise is that readX on altix
  contains the needed dma flush but there's a variant operator,
  readX_relaxed that doesn't (for drivers that know what they're doing).
  The altix critical drivers have all been converted to use the relaxed
  form for performance, and the unconverted ones should all operate just
  fine (albeit potentially more slowly).

Is this a recent change?  Because as of October 2007, 76d7cc03
(IB/mthca: Use mmiowb() to avoid firmware commands getting jumbled up)
was needed.  But this was involving writel() (__raw_writel() actually,
looking at the code), not readl().  But writel_relaxed() doesn't exist
(and doesn't make sense).

 - R.
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: MMIO and gcc re-ordering issue

2008-05-27 Thread Roland Dreier
  Um, OK, you've said write twice now ... I was assuming you meant read.
  Even on an x86, writes are posted, so there's no way a spin lock could
  serialise a write without an intervening read to flush the posting
  (that's why only reads have a relaxed version on altix).  Or is there
  something else I'm missing?

Writes are posted yes, but not reordered arbitrarily.  If I have code like:

spin_lock(mmio_lock);
writel(val1, reg1);
writel(val2, reg2);
spin_unlock(mmio_lock);

then I have a reasonable expectation that if two CPUs run this at the
same time, their writes to reg1/reg2 won't be interleaved with each
other (because the whole section is inside a spinlock).  And Altix
violates that expectation.

 - R.
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: MMIO and gcc re-ordering issue

2008-05-27 Thread Roland Dreier
  Writes are posted yes, but not reordered arbitrarily.

on standard x86 I mean here...
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: MMIO and gcc re-ordering issue

2008-05-27 Thread Roland Dreier
   Writes are posted yes, but not reordered arbitrarily.  If I have code like:
  
  spin_lock(mmio_lock);
  writel(val1, reg1);
  writel(val2, reg2);
  spin_unlock(mmio_lock);
  
   then I have a reasonable expectation that if two CPUs run this at the
   same time, their writes to reg1/reg2 won't be interleaved with each
   other (because the whole section is inside a spinlock).  And Altix
   violates that expectation.
  
  Does that necessarily follow?
  
  If you've got a large system with multiple pci bridges, could you end
  up with posted writes coming from different cpus taking a different
  amount of time to propagate to a device and thus colliding?

Not on x86.  And a given PCI device can only be reached from a single
host bridge, so I don't see how it can happen.  But on SGI altix
systems, there is a routed fabric between the CPU and the PCI bus, so
the reordering can happen there.  Hence mmiowb() and the endless supply
of driver bugs that it causes.

 - R.
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: MMIO and gcc re-ordering issue

2008-05-27 Thread Roland Dreier
  This is a different issue. We deal with it on powerpc by having writel
  set a per-cpu flag and spin_unlock() test it, and do the barrier if
  needed there.

Cool... I assume you do this for mutex_unlock() etc?

Is there any reason why ia64 can't do this too so we can kill mmiowb and
save everyone a lot of hassle?  (mips, sh and frv have non-empty
mmiowb() definitions too but I'd guess that these are all bugs based on
misunderstandings of the mmiowb() semantics...)

  However, drivers such as e1000 -also- have a wmb() between filling the
  ring buffer and kicking the DMA with MMIO, with a comment about this
  being needed for ia64 relaxed ordering.

I put these barriers into mthca, mlx4 etc, although it came from my
possible misunderstanding of the memory ordering rules in the kernel
more than any experience of problems (as opposed the the mmiowb()s,
which all came from real world bugs).

 - R.
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: MMIO and gcc re-ordering issue

2008-05-27 Thread Roland Dreier
   Cool... I assume you do this for mutex_unlock() etc?

  That's a good point... I don't think we do. Maybe we should.

I think it's needed -- take a look at 76d7cc03, which came from a real
bug seen on Altix boxes.
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: [ewg] [PATCH] IB/ehca: Protect QP against destroying until all async events for it are handled.

2008-05-07 Thread Roland Dreier
  We are not sure if this should be fixed in the driver or in uverbs itself.
  Roland, what's your opinion about this?

Would be nice to be able to fix it in uverbs but I don't see how.  In
particular a kernel consumer has to have the same guarantee that no
async events will come in after destroy QP returns.  And I don't see any
way generic code can provide a guarantee about what low-level driver
code may do internally.
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: [ewg] [PATCH] IB/ehca: Protect QP against destroying until all async events for it are handled.

2008-05-07 Thread Roland Dreier
  I agree, that's why I posted the driver fix first.

Glad we agree :)

I thought about it a little more and really convinced myself that there
is no good way for generic code to handle this race.

  So, will you apply it next?

Yes, will apply it shortly.

 - R.
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: [ewg] [PATCH] IB/ehca: Protect QP against destroying until all async events for it are handled.

2008-05-07 Thread Roland Dreier
So I applied this, but thinking about it further, do you (theoretically
at least) have the same problem with CQ and SRQ async events?

 - R.
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: [ewg] [PATCH] IB/ehca: Change function return types to correct type.

2008-05-05 Thread Roland Dreier
thanks, applied
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: [PATCH] IB/ehca: Allocate event queue size depending on max number of CQs and QPs

2008-04-29 Thread Roland Dreier
  
  Signed-off-by: Stefan Roscher stefan.roscher at de.ibm.com

Kind of an inadequate changelog ;)

Is this a fix or an enhancement or what?

  +if (atomic_read(shca-num_cqs) = ehca_max_cq) {

  +if (atomic_read(shca-num_qps) = ehca_max_qp) {

These are racy in the sense that multiple simultaneous calls to
create_cq/create_qp might end up exceeding the ehca_max_cq limit.  Is
that an issue?

You could close the race by using atomic_add_unless() and testing the
return value (and being careful to do atomic_dec() on error paths after
you bump num_cqs/num_qps).

 - R.
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: [REPOST][PATCH] IB/ehca: Allocate event queue size depending on max number of CQs and QPs

2008-04-29 Thread Roland Dreier
thanks, makes sense, applied.

fast turnaround too ;)
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: IB/ehca: handle negative return value from ibmebus_request_irq() properly in ehca_create_eq()

2008-04-28 Thread Roland Dreier
thanks, applied
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: [PATCH] IB/ehca: Prevent sending UD packets to QP0

2008-01-30 Thread Roland Dreier
thanks, applied
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: [PATCH 2/2] IB/ehca: Add PMA support

2008-01-30 Thread Roland Dreier
thanks, applied 1-2
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: Reminder: removal of arch/ppc

2008-01-25 Thread Roland Dreier
  Hm.  Katmai is already in-tree.  Theoretically I should be able to get
  the HW docs and do a new DTS and have it mostly just work.  Perhaps
  I'll give that a shot and have you give it a quick spin.

Yep, should be pretty easy.  A while back I built a Katmai kernel in
arch/ppc based on the Yucca support and the differences were pretty
trivial -- as I recall, the Katmai has compact flash support that the
Yucca doesn't have, and the Yucca has a simple little FPGA with a few
registers that controls power to the PCI slots.

 - R.
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: Reminder: removal of arch/ppc

2008-01-25 Thread Roland Dreier
  YUCCA
  
  Yucca was what again?  440spe?

Yes, it was the first 440SPe eval board; since replaced by Katmai.

I have a Yucca and can at least test thing; if no one else gets to it,
I may try to port it to arch/powerpc for 2.6.26.
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: [PATCH 4/4] IB/ehca: Prevent RDMA-related connection failures

2008-01-18 Thread Roland Dreier
thanks, applied 1-4.
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: [ofa-general] [PATCH] IB/ehca: Forward event client-reregister-required to registered clients

2008-01-03 Thread Roland Dreier
thanks, applied.
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: [PATCH] IB/ehca: Fix lock flag location, bump version number

2007-12-13 Thread Roland Dreier
applied, thanks
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: [ewg] Re: [PATCH] IB/ehca: Serialize HCA-related hCalls on POWER5

2007-12-12 Thread Roland Dreier
  What is the fix you suggest, to add a device query that tells you for 
  which verbs the documentation does not apply? or enhance the code of the 
map_phys_fmr verb within the ehca driver to return error if called 
  from non-sleepable context?

I think the right fix for iSER would be to make iSER work even for
devices that don't support FMRs.  For example cxgb3 doesn't implement
FMRs so if anyone ever updates iSER to work on iWARP and not just IB,
then this is something that has to be tackled anyway.  Then ehca could
just get rid of the FMR support it has.
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: [PATCH] IB/ehca: Return correct #SGEs for SRQ

2007-12-12 Thread Roland Dreier
thanks, applied.
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: [PATCH] IB/ehca: Serialize HCA-related hCalls if necessary

2007-12-12 Thread Roland Dreier
thanks, applied.

With your next batch of patches for 2.6.25, could you clean up:

  --- a/drivers/infiniband/hw/ehca/hcp_if.c
  +++ b/drivers/infiniband/hw/ehca/hcp_if.c
  @@ -89,6 +89,7 @@
   #define HCALL9_REGS_FORMAT HCALL7_REGS_FORMAT  r11=%lx r12=%lx
   
   static DEFINE_SPINLOCK(hcall_lock);
  +extern int ehca_lock_hcalls;

and move that extern declaration into an appropriate header file?

Thanks...
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: [PATCH] IB/ehca: Serialize HCA-related hCalls on POWER5

2007-12-10 Thread Roland Dreier
   map_phys_fmr
  
  In fact, we do use hCalls there. Our hardware doesn't actually support FMRs,
  so we translate a map FMR into a reallocate PMR, which doesn't work
  without hCalls. What's more, the hCalls involved (e.g. H_FREE_RESOURCE)
  might well return H_LONG_BUSY, so the whole operation might sleep; no way
  around it.

It's a big problem.  If you cannot implement FMRs in such a way that
you can handling having map_phys_fmr being called in a context that
can't sleep, then I think the only option is to remove your FMR
support.  It's an optional device feature, so this should be OK
(although the iSER driver currently seems to depend on a device
supporting FMRs, which is probably going to be a problem with iWARP
support in the future anyway).

The fact that consumers can map FMRs from interrupt context, while
holding locks, etc, is pretty fundamental to the use of FMRs so I
don't see any way around the requirement that map_phys_fmr never
sleep.

 - R.
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: [PATCH] IB/ehca: Serialize HCA-related hCalls on POWER5

2007-12-09 Thread Roland Dreier
  I think it needs some more inspection. The msleep in there is only called
  for hcalls that return H_IS_LONG_BUSY(). In theory, you can call
  ehca_plpar_hcall_norets() from inside an interrupt handler if the
  hcall in question never returns long busy.

Fair enough... according to Documentation/infiniband/core_locking.txt,
the only driver methods that cannot sleep are:

create_ah
modify_ah
query_ah
destroy_ah
bind_mw
post_send
post_recv
poll_cq
req_notify_cq
map_phys_fmr

and I don't think ehca does an hcall from any of those.  Of course
there might be other driver-internal code paths that I don't know
about.  Maybe do a quick audit and then stick might_sleep() in the
hcall functions to catch any mistakes?

 - R.
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: [PATCH] IB/ehca: Serialize HCA-related hCalls on POWER5

2007-12-06 Thread Roland Dreier
   +   ehca_lock_hcalls = !(cur_cpu_spec-cpu_user_features
   +         PPC_FEATURE_ARCH_2_05);

  We already talked about this yesterday, but I still feel that checking the
  instruction set of the CPU should not be used to determine whether a
  specific device driver implementation is used int hypervisor.

I had the same reaction... is testing cpu_user_features really the
best way to detect this issue?

I'll hold off applying this for a few days so you guys can decide the
best thing to do.  We'll definitely get some fix into 2.6.24 but we
have time to make a good decision.

  Regarding the performance problem, have you checked whether converting all
  your spin_lock_irqsave to spin_lock/spin_lock_irq improves your performance
  on the older machines? Maybe it's already fast enough that way.

It does seem that the only places that the hcall_lock is taken also
use msleep, so they must always be in process context.  So you can
safely just use spin_lock(), right?

 - R.
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: [PATCH] IB/ehca: Fix static rate if path faster than link

2007-11-28 Thread Roland Dreier
thanks, applied
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: [PATCH] IB/ehca: Fix static rate regression

2007-11-24 Thread Roland Dreier
thanks, applied.
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: [PATCH 2/2] IB/ehca: Fix static rate calculation

2007-11-07 Thread Roland Dreier
thanks, applied both patches.
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: [PATCH 4/5] ibmebus: Move to of_device and of_platform_driver, match eHCA and eHEA drivers

2007-10-03 Thread Roland Dreier
   Replace struct ibmebus_dev and struct ibmebus_driver with struct of_device
   and struct of_platform_driver, respectively. Match the external ibmebus
   interface and drivers using it.
   
   Signed-off-by: Joachim Fenkes [EMAIL PROTECTED]
  
  If not, then you need to get an Acked-by and an agreement that this
  change can go via the powerpc.git tree from Roland Dreier and Jeff
  Garzik.

I don't see anything objectionable in the infiniband parts of the
patch -- I don't have any way to test the changes but it all looks
like a straightforward conversion to a new platform API.  So:

Acked-by: Roland Dreier [EMAIL PROTECTED]

 - R.
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: [PATCH 3/3] IB/ehca: Make sure user pages are from hugetlb before using MR large pages

2007-09-20 Thread Roland Dreier
thanks, applied this and the umem patch...
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: [PATCH 2/3] IB/umem: Add hugetlb flag to struct ib_umem

2007-09-19 Thread Roland Dreier
This looks realy nice to me... a very clean patch.

I'll add this to 2.6.24 unless someone objects soon...
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: [PATCH 1/3] IB/ehca: Fix large page HW cap defines

2007-09-17 Thread Roland Dreier
obviously OK...applied.
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: [PATCH 02/12] IB/ehca: Add 1 is not longer needed because of firmware interface change

2007-09-12 Thread Roland Dreier
What happens if someone runs the new driver with older firmware?  Or
what if someone upgrades the firmware without updating the driver?

 - R.
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: [PATCH] IB/ehca: Make sure user pages are from hugetlb before using MR large pages

2007-09-12 Thread Roland Dreier
  -#define HCA_CAP_MR_PGSIZE_4K  1
  -#define HCA_CAP_MR_PGSIZE_64K 2
  -#define HCA_CAP_MR_PGSIZE_1M  4
  -#define HCA_CAP_MR_PGSIZE_16M 8
  +#define HCA_CAP_MR_PGSIZE_4K  0x8000
  +#define HCA_CAP_MR_PGSIZE_64K 0x4000
  +#define HCA_CAP_MR_PGSIZE_1M  0x2000
  +#define HCA_CAP_MR_PGSIZE_16M 0x1000

Not sure I understand what this has to do with things... is this an
unrelated fix?

  +static int ehca_is_mem_hugetlb(unsigned long addr, unsigned long size)

This is rather awful -- another call to get_user_pages() to iterate
over all the vmas...

I would suggest extending ib_umem_get() to check the vmas and adding a
member to struct ib_umem to say whether the memory is entirely covered
by hugetlb pages or not.

  +ret = ehca_is_mem_hugetlb(virt, length);
  +switch (ret) {
  +case 0: /* mem is not from hugetlb */
  +hwpage_size = PAGE_SIZE;
  +break;
  +case 1:
  +if (length = EHCA_MR_PGSIZE4K
  + PAGE_SIZE == EHCA_MR_PGSIZE4K)
  +hwpage_size = EHCA_MR_PGSIZE4K;
  +else if (length = EHCA_MR_PGSIZE64K)
  +hwpage_size = EHCA_MR_PGSIZE64K;
  +else if (length = EHCA_MR_PGSIZE1M)
  +hwpage_size = EHCA_MR_PGSIZE1M;
  +else
  +hwpage_size = EHCA_MR_PGSIZE16M;
  +break;
  +default: /* out of mem */
  +ib_mr = ERR_PTR(-ENOMEM);
  +goto reg_user_mr_exit1;

It seems like it would be better to just assume the memory is not from
a hugetlb is ehca_is_mem_hugetlb() fails its memory allocation and
fall back to the PAGE_SIZE case rather than failing entirely.

Also if someone runs a kernel with 64K pages on a machine where they
end up being simulated from 4K pages, do you have the same issue with
the hypervisor ganging together non-contiguous pages?

 - R.
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: [ofa-general] [PATCH 7/7] IB/ehca: Prevent overwriting QP init attributes given by caller

2007-08-16 Thread Roland Dreier
I don't understand this patch.  rdma/ib_verbs.h says this about
ib_create_qp():

 * @qp_init_attr: A list of initial attributes required to create the
 *   QP.  If QP creation succeeds, then the attributes are updated to
 *   the actual capabilities of the created QP.

So it seems the current code is actually correct and your patch breaks
it??
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: [ofa-general] [PATCH 1/2] ehca: remove checkpatch.pl's warnings externs should be avoided in .c files

2007-07-30 Thread Roland Dreier
  Was going to recreate this patch, but then I saw that you
  probably have incorporated it (manually) in your latest git.
  Just want to make sure I'm seeing it right.

Yes, I ended up doing it by hand.  Thanks.
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: [PATCH 2/5] ehca: Generate event when SRQ limit reached

2007-07-20 Thread Roland Dreier
thanks, applied.

BTW, does your SRQ-capable hardware support generating the last WQE
reached event?  There's not any reliable way to avoid problems when
destroying QPs attached to an SRQ without it, and the IB spec requires
CAs that support SRQs to generate it (o11-5.2.5 in chapter 11 of vol 1).

I don't see any code in ehca to generate the event, and IPoIB CM at
least will be very unhappy when using SRQs if the event is not
generated.

 - R.
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: [PATCH 3/5] ehca: Make ehca2ib_return_code() non-inline

2007-07-20 Thread Roland Dreier
thanks, applied
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: [ofa-general] [PATCH 5/5] ehca: Support small QP queues

2007-07-20 Thread Roland Dreier
thanks, applied.  I fixed this up myself to work with commit 20c2df83,
which got rid of the destructor argument to kmem_cache_create() -- you
probably want to check my tree to make sure it's OK.

Also the same as I said before about checkpatch.pl's warning:

WARNING: externs should be avoided in .c files
#337: FILE: drivers/infiniband/hw/ehca/ehca_pd.c:91:
+   extern struct kmem_cache *small_qp_cache;

please fix that up when you get a chance
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: [PATCH 01/10] IB/ehca: Support for multiple event queues

2007-07-17 Thread Roland Dreier
  Here's some anecdotal evidence :)
  http://lists.openfabrics.org/pipermail/general/2007-May/035758.html

Right, but then we went on to say that we probably want to use
multiple vectors to separate out multiple HCA ports rather than
send/sreceive on the same port.  And the current IPoIB implementation
of having that second CQ seems suboptimal anyway, since it seems to
leave us susceptible to the interrupt overload that NAPI was supposed
to solve.

At a higher level, I'm left wondering why nobody talked about multiple
EQs during the last months of the 2.6.22 process and now all of a
sudden it becomes urgent in the last few days of the 2.6.23 merge
window.  That's not really how I like to merge features

 - R.
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


  1   2   >