[ewg] Re: Possible process deadlock in RMPP flow

2009-09-24 Thread Or Gerlitz
Eli Cohen wrote:
 On Wed, Sep 23, 2009 at 09:08:28AM -0700, Sean Hefty wrote:
 What kernel does 1.4.2 map to?
 I think OFED 1.4.2 is based on kernel 2.6.27 but they're using RHEL 5.3

Yes, the usual mess: ofed X is based on kernel Y1 but with some additions from 
kernel Y2 plus plenty of unreviwed and non-merged patches. Distro Z picks ofed 
X and the result is 99% unsupportable as Roland said. Somehow this ofed 
creature is still hanging around working on the the next damage its going to 
bring into this world (code name 1.5)

Eli, here's a little tip for you, I had the displeasure to resolve bunch of 
support cases originating from the fact that the below 2 years old commit 
missed some ofed version (sorry forgot the number...), maybe it would help you 
as well?

Under a normal setting, if this commit actually solves a bug being hit by many 
costumers, someone would have opened a distro bugzilla case saying, please 
pick this commit for your kernel, the customers would have either wait for the 
next distro update or use a distro intermediate kernel. Currently, I understand 
that distros are picking ofed versions and that's it.

Or.

commit b61d92d8ae6aa13b17d1c31e69d123879cec2ee2
Author: Sean Hefty sean.he...@intel.com
Date:   Fri Nov 30 17:30:18 2007 -0800

IB/mad: Fix incorrect access to items on local_list


___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


[ewg] Re: Possible process deadlock in RMPP flow

2009-09-24 Thread Eli Cohen
On Thu, Sep 24, 2009 at 09:38:43AM +0300, Or Gerlitz wrote:
 
 commit b61d92d8ae6aa13b17d1c31e69d123879cec2ee2
 Author: Sean Hefty sean.he...@intel.com
 Date:   Fri Nov 30 17:30:18 2007 -0800
 
 IB/mad: Fix incorrect access to items on local_list
 
Thanks Or. This one is already in OFED 1.4.2 but apparently this is a
different problem. Once I have information whether the patch Roland
posted fixed it I will update the list.
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


[ewg] Re: [PATCH] libehca supported kernel versions

2009-09-24 Thread Vladimir Sokolovsky

Alexander Schmidt wrote:

Hi Vlad,

please apply the following patch for install.pl.

Signed-off-by: Alexander Schmidt al...@linux.vnet.ibm.com

Index: OFED-1.5-20090915-0844/install.pl
===
--- OFED-1.5-20090915-0844.orig/install.pl
+++ OFED-1.5-20090915-0844/install.pl
@@ -1646,10 +1646,8 @@ sub set_availability
 set_compilers();
 
 # Ehca

-# if ($arch =~ m/ppc64|powerpc/ and
-# $kernel =~ m/2.6.1[6-9]|2.6.2[0-9]/) {
 if ($arch =~ m/ppc64|powerpc/ and
-$kernel =~ m/2.6.30/) {
+$kernel =~ m/2.6.1[6-9]|2.6.2[0-9]|2.6.30/) {
 $kernel_modules_info{'ehca'}{'available'} = 1;
 $packages_info{'libehca'}{'available'} = 1;
 $packages_info{'libehca-devel-static'}{'available'} = 1;

  

Applied,

Regards,
Vladimir
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


Re: [ewg] [PATCH] kernel_fixes: import a patch to fix bugzilla 1664

2009-09-24 Thread Vladimir Sokolovsky

Moni Shoua wrote:

Add commit 5e47596bee12597824a3b5b21e20f80b61e58a35 to kernel fixes.
This will fix https://bugs.openfabrics.org/show_bug.cgi?id=1664.

Signed-off-by: Moni Shoua mo...@voltaire.com
---
 kernel_patches/fixes/ipoib_0550_check_multicast_address_format.patch |   51 
++
 1 file changed, 51 insertions(+)
  


Applied,

Regards,
Vladimir
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


[ewg] [PATCH] ofed-docs: A cooment about ib-bonding and newer kernels

2009-09-24 Thread Moni Shoua
Add comment for ib-bonding and distros that use new kernels (i.e. SLES11)

Signed-off-by: Moni Shoua mo...@voltaire.com
---
 ipoib_release_notes.txt |5 +

 1 file changed, 5 insertions(+)
diff --git a/ipoib_release_notes.txt b/ipoib_release_notes.txt
index adf1304..3c3d70f 100644
--- a/ipoib_release_notes.txt
+++ b/ipoib_release_notes.txt
@@ -271,6 +271,11 @@ Notes:
 * Using /etc/infiniband/openib.conf to create a persistent configuration is
   no longer supported
 * On RHEL4_U7, cannot set a slave interface as primary.
+* ib-bonding will not be compiled and installed with OFED on OS with kernel
+  that is = 2.6.27. The bonding driver that comes with those kernels already
+  supports enslaving of IPoIB interfaces. However, there still might be a issue
+  of OS configuration tools (like sysconfig or initscripts) that needs a fix 
but
+  such issues were not observed yet.
 
 
 ===
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


[ewg] [PATCH] ehca: backports for 2.6.27

2009-09-24 Thread Alexander Schmidt
Hi Vlad,

please apply the following ehca backports for 2.6.27. Thanks!

Signed-off-by: Alexander Schmidt al...@linux.vnet.ibm.com

Index: ofa_kernel-1.5/kernel_patches/backport/2.6.27/ehca-010-undo_cpumask.patch
===
--- /dev/null
+++ ofa_kernel-1.5/kernel_patches/backport/2.6.27/ehca-010-undo_cpumask.patch
@@ -0,0 +1,42 @@
+---
+ drivers/infiniband/hw/ehca/ehca_irq.c |   14 --
+ 1 file changed, 8 insertions(+), 6 deletions(-)
+
+Index: ofa_kernel-1.5/drivers/infiniband/hw/ehca/ehca_irq.c
+===
+--- ofa_kernel-1.5.orig/drivers/infiniband/hw/ehca/ehca_irq.c  2009-07-27 
08:20:08.0 -0400
 ofa_kernel-1.5/drivers/infiniband/hw/ehca/ehca_irq.c   2009-07-27 
08:26:31.0 -0400
+@@ -659,12 +659,12 @@
+ 
+   WARN_ON_ONCE(!in_interrupt());
+   if (ehca_debug_level = 3)
+-  ehca_dmp(cpu_online_mask, cpumask_size(), );
++  ehca_dmp(cpu_online_map, sizeof(cpumask_t), );
+ 
+   spin_lock_irqsave(pool-last_cpu_lock, flags);
+-  cpu = cpumask_next(pool-last_cpu, cpu_online_mask);
++  cpu = next_cpu_nr(pool-last_cpu, cpu_online_map);
+   if (cpu = nr_cpu_ids)
+-  cpu = cpumask_first(cpu_online_mask);
++  cpu = first_cpu(cpu_online_map);
+   pool-last_cpu = cpu;
+   spin_unlock_irqrestore(pool-last_cpu_lock, flags);
+ 
+@@ -855,7 +855,7 @@
+   case CPU_UP_CANCELED_FROZEN:
+   ehca_gen_dbg(CPU: %x (CPU_CANCELED), cpu);
+   cct = per_cpu_ptr(pool-cpu_comp_tasks, cpu);
+-  kthread_bind(cct-task, cpumask_any(cpu_online_mask));
++  kthread_bind(cct-task, any_online_cpu(cpu_online_map));
+   destroy_comp_task(pool, cpu);
+   break;
+   case CPU_ONLINE:
+@@ -902,7 +902,7 @@
+   return -ENOMEM;
+ 
+   spin_lock_init(pool-last_cpu_lock);
+-  pool-last_cpu = cpumask_any(cpu_online_mask);
++  pool-last_cpu = any_online_cpu(cpu_online_map);
+ 
+   pool-cpu_comp_tasks = alloc_percpu(struct ehca_cpu_comp_task);
+   if (pool-cpu_comp_tasks == NULL) {
Index: 
ofa_kernel-1.5/kernel_patches/backport/2.6.27/ehca-020-undo_unsigned_long.patch
===
--- /dev/null
+++ 
ofa_kernel-1.5/kernel_patches/backport/2.6.27/ehca-020-undo_unsigned_long.patch
@@ -0,0 +1,1005 @@
+Index: ofa_kernel-1.5/drivers/infiniband/hw/ehca/ehca_cq.c
+===
+--- ofa_kernel-1.5.orig/drivers/infiniband/hw/ehca/ehca_cq.c   2009-07-26 
09:08:48.0 -0400
 ofa_kernel-1.5/drivers/infiniband/hw/ehca/ehca_cq.c2009-07-27 
08:59:04.0 -0400
+@@ -196,7 +196,7 @@
+ 
+   if (h_ret != H_SUCCESS) {
+   ehca_err(device, hipz_h_alloc_resource_cq() failed 
+-   h_ret=%lli device=%p, h_ret, device);
++   h_ret=%li device=%p, h_ret, device);
+   cq = ERR_PTR(ehca2ib_return_code(h_ret));
+   goto create_cq_exit2;
+   }
+@@ -232,7 +232,7 @@
+ 
+   if (h_ret  H_SUCCESS) {
+   ehca_err(device, hipz_h_register_rpage_cq() failed 
+-   ehca_cq=%p cq_num=%x h_ret=%lli counter=%i 
++   ehca_cq=%p cq_num=%x h_ret=%li counter=%i 
+act_pages=%i, my_cq, my_cq-cq_number,
+h_ret, counter, param.act_pages);
+   cq = ERR_PTR(-EINVAL);
+@@ -244,7 +244,7 @@
+   if ((h_ret != H_SUCCESS) || vpage) {
+   ehca_err(device, Registration of pages not 
+complete ehca_cq=%p cq_num=%x 
+-   h_ret=%lli, my_cq, my_cq-cq_number,
++   h_ret=%li, my_cq, my_cq-cq_number,
+h_ret);
+   cq = ERR_PTR(-EAGAIN);
+   goto create_cq_exit4;
+@@ -252,7 +252,7 @@
+   } else {
+   if (h_ret != H_PAGE_REGISTERED) {
+   ehca_err(device, Registration of page failed 
+-   ehca_cq=%p cq_num=%x h_ret=%lli 
++   ehca_cq=%p cq_num=%x h_ret=%li 
+counter=%i act_pages=%i,
+my_cq, my_cq-cq_number,
+h_ret, counter, param.act_pages);
+@@ -266,7 +266,7 @@
+ 
+   gal = my_cq-galpas.kernel;
+   cqx_fec = hipz_galpa_load(gal, CQTEMM_OFFSET(cqx_fec));
+-  ehca_dbg(device, ehca_cq=%p cq_num=%x CQX_FEC=%llx,
++  ehca_dbg(device, ehca_cq=%p cq_num=%x CQX_FEC=%lx,
+my_cq, 

[ewg] Re: [PATCH] ehca: backports for 2.6.27

2009-09-24 Thread Vladimir Sokolovsky

Alexander Schmidt wrote:

Hi Vlad,

please apply the following ehca backports for 2.6.27. Thanks!

Signed-off-by: Alexander Schmidt al...@linux.vnet.ibm.com
  


Applied,

Regards,
Vladimir
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


[ewg] RE: Possible process deadlock in RMPP flow

2009-09-24 Thread Sean Hefty
Thanks Or. This one is already in OFED 1.4.2 but apparently this is a
different problem. Once I have information whether the patch Roland
posted fixed it I will update the list.

If ibnetdiscover doesn't use RMPP as Hal indicated, I don't think Roland's patch
will help.

___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg