[ewg] RE: [PATCH] ISER: fix compilation issues on Lustre kernels based on RHEL4.0U[4-6].

2008-01-22 Thread Moshe Kazir
We'll try to find a computer and test it.

Moshe 



Moshe Katzir   |  +972-9971-8639 (o)   |   +972-52-860-6042  (m)
 
Voltaire - The Grid Backbone
 
 www.voltaire.com

  

-Original Message-
From: Vladimir Sokolovsky [mailto:[EMAIL PROTECTED] 
Sent: Monday, January 21, 2008 6:08 PM
To: Moshe Kazir; Erez Zilber
Cc: OpenFabricsEWG; Nir Gal; Yair Ifergan
Subject: [PATCH] ISER: fix compilation issues on Lustre kernels based on
RHEL4.0U[4-6].

Hi Moshe,
The following patch fixes OFED-1.3 compilation issue on Lustre kernels:

ISER: fix compilation issues on Lustre kernels based on RHEL4.0U[4-6].

Signed-off-by: Vladimir Sokolovsky [EMAIL PROTECTED]
---
diff --git a/kernel_patches/backport/2.6.9_U4/iscsi_06_scsi_addons.patch
b/kernel_patches/backport/2.6.9_U4/iscsi_06_scsi_addons.patch
index 1b5af7b..2c6d71f 100644
--- a/kernel_patches/backport/2.6.9_U4/iscsi_06_scsi_addons.patch
+++ b/kernel_patches/backport/2.6.9_U4/iscsi_06_scsi_addons.patch
@@ -69,7 +69,7 @@ index e212608..3bf2015 100644
   obj-$(CONFIG_SCSI_ISCSI_ATTRS) += scsi_transport_iscsi.o
   obj-$(CONFIG_ISCSI_TCP)+= libiscsi.o   iscsi_tcp.o
  +
-+CFLAGS_attribute_container.o =
-I$(PWD)/kernel_addons/backport/2.6.9_U4/include/src/
++CFLAGS_attribute_container.o =   $(BACKPORT_INCLUDES)/src/
  +
  +scsi_transport_iscsi-y := scsi_transport_iscsi_f.o scsi.o scsi_lib.o
init.o klist.o attribute_container.o transport_class.o
  +libiscsi-y := libiscsi_f.o scsi_scan.o
diff --git a/kernel_patches/backport/2.6.9_U5/iscsi_06_scsi_addons.patch
b/kernel_patches/backport/2.6.9_U5/iscsi_06_scsi_addons.patch
index 1b5af7b..2c6d71f 100644
--- a/kernel_patches/backport/2.6.9_U5/iscsi_06_scsi_addons.patch
+++ b/kernel_patches/backport/2.6.9_U5/iscsi_06_scsi_addons.patch
@@ -69,7 +69,7 @@ index e212608..3bf2015 100644
   obj-$(CONFIG_SCSI_ISCSI_ATTRS) += scsi_transport_iscsi.o
   obj-$(CONFIG_ISCSI_TCP)+= libiscsi.o   iscsi_tcp.o
  +
-+CFLAGS_attribute_container.o =
-I$(PWD)/kernel_addons/backport/2.6.9_U4/include/src/
++CFLAGS_attribute_container.o =   $(BACKPORT_INCLUDES)/src/
  +
  +scsi_transport_iscsi-y := scsi_transport_iscsi_f.o scsi.o scsi_lib.o
init.o klist.o attribute_container.o transport_class.o
  +libiscsi-y := libiscsi_f.o scsi_scan.o
diff --git a/kernel_patches/backport/2.6.9_U6/iscsi_06_scsi_addons.patch
b/kernel_patches/backport/2.6.9_U6/iscsi_06_scsi_addons.patch
index 1b5af7b..2c6d71f 100644
--- a/kernel_patches/backport/2.6.9_U6/iscsi_06_scsi_addons.patch
+++ b/kernel_patches/backport/2.6.9_U6/iscsi_06_scsi_addons.patch
@@ -69,7 +69,7 @@ index e212608..3bf2015 100644
   obj-$(CONFIG_SCSI_ISCSI_ATTRS) += scsi_transport_iscsi.o
   obj-$(CONFIG_ISCSI_TCP)+= libiscsi.o   iscsi_tcp.o
  +
-+CFLAGS_attribute_container.o =
-I$(PWD)/kernel_addons/backport/2.6.9_U4/include/src/
++CFLAGS_attribute_container.o =   $(BACKPORT_INCLUDES)/src/
  +
  +scsi_transport_iscsi-y := scsi_transport_iscsi_f.o scsi.o scsi_lib.o
init.o klist.o attribute_container.o transport_class.o
  +libiscsi-y := libiscsi_f.o scsi_scan.o
diff --git a/ofed_scripts/makefile b/ofed_scripts/makefile index
bcc55fe..cb89d00 100644
--- a/ofed_scripts/makefile
+++ b/ofed_scripts/makefile
@@ -67,7 +67,7 @@ kernel:
@echo Kernel version: $(KVERSION)
@echo Modules directory: $(DESTDIR)/$(MODULES_DIR)
@echo Kernel sources: $(KSRC)
-   env CWD=$(CWD) \
+   env CWD=$(CWD) BACKPORT_INCLUDES=$(BACKPORT_INCLUDES) \
$(MAKE) -C $(KSRC) SUBDIRS=$(CWD) \
V=1 $(WITH_MAKE_PARAMS) \
CONFIG_MEMTRACK=$(CONFIG_MEMTRACK) \
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


Re: [ewg] [PATCH] IB/iser: Add iSER fix that will be merged into 2.6.25

2008-01-22 Thread Vladimir Sokolovsky

Erez Zilber wrote:

This fix adds a printk before initiating a BUG() when
receiving an unhandled RDMA-CM event.

Signed-off-by: Erez Zilber [EMAIL PROTECTED]
---
 ...nformation_about_unhandled_RDMA_CM_events.patch |   33 
 1 files changed, 33 insertions(+), 0 deletions(-)
 create mode 100644 
kernel_patches/fixes/iser_01_Print_information_about_unhandled_RDMA_CM_events.patch

diff --git 
a/kernel_patches/fixes/iser_01_Print_information_about_unhandled_RDMA_CM_events.patch
 
b/kernel_patches/fixes/iser_01_Print_information_about_unhandled_RDMA_CM_events.patch
new file mode 100644
index 000..4b96c8f
--- /dev/null
+++ 
b/kernel_patches/fixes/iser_01_Print_information_about_unhandled_RDMA_CM_events.patch
@@ -0,0 +1,33 @@
+Print information about unhandled RDMA CM events
+
+Some RDMA CM events are not supported or not handled in iSER.
+This patch adds some info (printk) for the user about them.
+
+Signed-off-by: Erez Zilber [EMAIL PROTECTED]
+---
+ drivers/infiniband/ulp/iser/iser_verbs.c |6 ++
+ 1 files changed, 2 insertions(+), 4 deletions(-)
+
+diff --git a/drivers/infiniband/ulp/iser/iser_verbs.c 
b/drivers/infiniband/ulp/iser/iser_verbs.c
+index 654a4dc..675d00b 100644
+--- a/drivers/infiniband/ulp/iser/iser_verbs.c
 b/drivers/infiniband/ulp/iser/iser_verbs.c
+@@ -475,13 +475,11 @@ static int iser_cma_handler(struct rdma_cm_id *cma_id, 
struct rdma_cm_event *eve
+   iser_disconnected_handler(cma_id);
+   break;
+   case RDMA_CM_EVENT_DEVICE_REMOVAL:
++  iser_err(Device removal is currently unsupported\n);
+   BUG();
+   break;
+-  case RDMA_CM_EVENT_CONNECT_RESPONSE:
+-  BUG();
+-  break;
+-  case RDMA_CM_EVENT_CONNECT_REQUEST:
+   default:
++  iser_err(Unexpected RDMA CM event (%d)\n, event-event);
+   break;
+   }
+   return ret;
+-- 
+1.5.3.7

+


Applied to the ofed_1_3/linux-2.6.git ofed_kernel.

Regards,
Vladimir
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


[ewg] OFED Jan 21 meeting summary on RC2 status

2008-01-22 Thread Tziporet Koren
 
OFED Jan-21 meeting summary on OFED 1.3-rc2 status

Meeting summary:
--
* RC2 is in good status, beside some bugs that should be fixed for RC3
* RC3 is planned for next week

Meeting details:

1. Review RC2 status
   * Qlogic - status is good with their vnic and general tests; see
issues with qperf on RDS
   * Intel - RC2 is OK, vmapich is fixed for ia64
   * Mellanox - regression is good and stable; Cleanup SDP bugs.
   * IBM - Status is good; PPC issues resolved
   * Neteffect - Status is OK
   * Chelsio - Testing progress well
   * Cisco - no update
   * Voltaire - have issues with bonding and IPoIB performance 
   * MPI - all MPI packages are in good shape

2. Update on tasks that should be completed for RC2:
   *XRC - enhanced API - should be submitted today
   * IPoIB - need to resolve the new issue reported by Voltaire


Tziporet
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


Re: [ewg] RDS - Recovering from RDMA errors

2008-01-22 Thread Olaf Kirch
On Sunday 20 January 2008 20:57, Roland Dreier wrote:
 If you could send me some code and a recipe to get the bogus CQ
 message, that might be helpful.  Because as far as I can see, there
 shouldn't be any way for a consumer to get that message without a bug
 in the low-level driver.  It's fine if it's a whole big RDS test case,
 I just want to be able to run the test and instrument the low-level
 driver to get a better handle on what's happening.

Okay, I put my current patch queue into a git tree. It's in
the testing branch of

git://www.openfabrics.org/~okir/ofed_1_3/linux-2.6.git
git://www.openfabrics.org/~okir/ofed_1_3/rds-tools.git

In order to reproduce the problem, I usually run

while sleep 1; do
rds-stress -R -r locip -s remip -p 4000 -c -d2 -t8 -T5 -D1m
done

Within minutes, I get syslog messages saying

Timed out waiting for CQs to be drained - recv: 0 entries, send: 4 entries left

This message originates from net/rds_ib_cm.c - as a workaround, I added
a timeout of 1 second when waiting for the WQs to be drained. I usually
get those stalls after a WQE completes with status 10 (or sometimes 4).

 BTW, what kind of HCA are you using for this testing?

A pair of fairly new Mellanox cards.

Olaf
-- 
Olaf Kirch  |  --- o --- Nous sommes du soleil we love when we play
[EMAIL PROTECTED] |/ | \   sol.dhoop.naytheet.ah kin.ir.samse.qurax
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


[ewg] Question regarding non srq patch for OFED 1.3

2008-01-22 Thread Pradeep Satyanarayana
Some HCAs like ehca do not natively support srq. In order to enable IPoIB CM
for such HCAs, I have developed a nonsrq patch. This patch has been accepted
into Roland's for-2.6.25 git tree for about 3 months now.

I am working on porting that to OFED 1.3 and it will take me at least several 
days to finish the port and test it. Is there a date by which I need to 
complete 
it for it's inclusion into OFED 1.3?

Pradeep

___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


Re: [ewg] RDS - Recovering from RDMA errors

2008-01-22 Thread Roland Dreier
   BTW, what kind of HCA are you using for this testing?
  
  A pair of fairly new Mellanox cards.

How new?  Is it ConnectX or something older -- ie do you use the
ib_mthca or mlx4_ib driver?  If you're using mlx4, then I could
believe there is a firmware bug that leads to lost completions.

 - R.
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


Re: [ewg] RE: [ofa-general] OFED Jan 14 meeting summary on RC2readiness

2008-01-22 Thread Roland Dreier
   I guess you mean just implement XRC without allowing multiple
   processes to share an XRC domain?  That actually seems like a sensible
   thing to implement as well...
  
  This is part of the current XRC implementation -- just give -1 as the fd 
  value
  in ibv_open_xrc_domain().

I *think* Gleb's point was that the XRC implementation could be much
simpler if this were the *only* case supported -- you wouldn't need
all the complexity of kernel receive QPs etc I guess.  Gleb, is that
what you meant?
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


[ewg] Re: Question regarding non srq patch for OFED 1.3

2008-01-22 Thread Tziporet Koren

Pradeep Satyanarayana wrote:

Some HCAs like ehca do not natively support srq. In order to enable IPoIB CM
for such HCAs, I have developed a nonsrq patch. This patch has been accepted
into Roland's for-2.6.25 git tree for about 3 months now.

I am working on porting that to OFED 1.3 and it will take me at least several 
days to finish the port and test it. Is there a date by which I need to complete 
it for it's inclusion into OFED 1.3?


Pradeep


  


If I remember correctly this is not a small patch, thus I don't know if 
this is not too late for OFED 1.3 since it may delay the Feb release.
However we can discuss this in the next OFED meeting on Monday and see 
what other people think


Tziporet
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg