Re: [ofa-general] Fedora 11, kernel 2.6.31

2009-09-22 Thread Jack Morgenstein
On Tuesday 22 September 2009 21:58, Nathan Stratton wrote: > > Having an issue with getting verbs working on 2.6.31. I am running Fedora > 11 with 2.6.31 and 1.1.2-0.1.gb00dc7d libibverbs. Everything looks great > until I run ibv_srq_pingpong to the server. It shows local/remote address > a bun

[ofa-general] [PATCH] mthca: Fix access to freed memory in catas processing

2009-09-21 Thread Jack Morgenstein
catas_reset() uses a pointer to mthca_dev, but mthca_dev may not be valid after the call to __mthca_restart_one(). Based on a similar patch for mlx4 by Vitaliy Gusev Signed-off-by: Jack Morgenstein --- Roland, Here is the equivalent patch for mthca catas error processing. Here, also, we

Re: [ofa-general] Fedora 10 OFED support plans

2009-09-13 Thread Jack Morgenstein
On Thursday 10 September 2009 23:00, Jeremy Enos wrote: > Fails w/ ofa_kernel like the others have... I didn't test excluding this rpm > with FC11, but the others also fail elsewhere w/ this rpm excluded- so I'm > guessing FC11 would as well.  I included the output (and last 50 lines of > log)

Re: [ofa-general] Installing SDP on existing OFED 1.3.1 install - DRBD SDP/Infiniband Support

2009-09-01 Thread Jack Morgenstein
dules/2.6.18-92.1.13.el5xen/kernel/drivers/infiniband/ulp/sdp > /lib/modules/2.6.18-92.1.13.el5xen/kernel/drivers/infiniband/ulp/sdp/ib_ > sdp.ko > /lib/modules/2.6.18-92.el5/kernel/drivers/infiniband/ulp/sdp > /lib/modules/2.6.18-92.el5/kernel/drivers/infiniband/ulp/sdp/ib_

Re: [ofa-general] Installing SDP on existing OFED 1.3.1 install - DRBD SDP/Infiniband Support

2009-09-01 Thread Jack Morgenstein
; > Configured devices: > ib0 > > Currently active devices: > ib0 > > The following OFED modules are loaded: > > rdma_ucm > rdma_cm > ib_addr > ib_ipoib > mlx4_core > mlx4_ib > ib_mthca > ib_uverbs > ib_umad > ib_sa > ib_

Re: [ofa-general] Installing SDP on existing OFED 1.3.1 install - DRBD SDP/Infiniband Support

2009-09-01 Thread Jack Morgenstein
On Tuesday 01 September 2009 13:44, Robert Dunkley wrote: > Hi everyone, > > A DRBD release candidate with specific SDP/Infiniband support was > released last week. > > I have an existing OFED 1.3.1 install without the SDP protocol loaded, I > need to add it. I still have the original source I in

Re: [ofa-general] Fedora 10 OFED support plans

2009-08-31 Thread Jack Morgenstein
> >>> I think OFED 1.5 might work on it but not sure. Which kernel version > >>> FC10 use? > >>> In general OFED 1.5 supports FC11 > >>> > >> Actually, it supports FC12 (kernel 2.6.29). > >> > > We had originally planned to support FC11 -- however, in the interim, FC12 > > was > > rel

Re: [ofa-general] Fedora 10 OFED support plans

2009-08-30 Thread Jack Morgenstein
On Sunday 30 August 2009 18:47, Jack Morgenstein wrote: > On Sunday 23 August 2009 11:16, Tziporet Koren wrote: > > Jeremy Enos wrote: > > > Coming up on a year of Fedora 10 GA... Fedora 9 no longer maintained. > > > No OFED support for FC10 yet creates a t

Re: [ofa-general] Fedora 10 OFED support plans

2009-08-30 Thread Jack Morgenstein
On Sunday 23 August 2009 11:16, Tziporet Koren wrote: > Jeremy Enos wrote: > > Coming up on a year of Fedora 10 GA... Fedora 9 no longer maintained. > > No OFED support for FC10 yet creates a tough spot if trying to stay > > secure. Is there *any* version (1.5, etc) that will even build on FC10?

Re: [ofa-general] Number of devices returned by ibv_get_device_list()

2009-08-30 Thread Jack Morgenstein
On Wednesday 26 August 2009 02:03, MANIKANTAN KALAIYA wrote: > Resending to the mailing list... > > We have Ofed1.3.1 installed, one of the sub packages is libibverbs version > 1.1.1. We have a small program that lists the number of IB cards available in > the system through ibv_get_device_list(

Re: [ofa-general] Fwd: OFED-1.5-alpha4 installation problem

2009-08-30 Thread Jack Morgenstein
On Wednesday 26 August 2009 13:17, Sneha Mistry wrote: > Hi, > > I am new be to Infiniband and trying to install OFED-1.5-alpha4 on > opensuse 10.3 . > Kernel version is  2.6.26-2-686 . 1. OFED 1.5 is not supported on OpenSuse 10.3 -- it is supported on OpenSuse 11. 2. You are correct in that the

[ofa-general] [PATCH V3] mlx4: Do not allow ib userspace open following a fatal event

2009-08-30 Thread Jack Morgenstein
device comes back up, thus preventing the above deadlock. V2: move active flag from net to hw/mlx4, and use only for fatal event flow. (per feedback from Roland). V3: fixed checkpatch.pl warnings. Signed-off-by: Jack Morgenstein --- Roland, Sorry about the checkpatch.pl oversight. No excuse

[ofa-general] [PATCH] mthca: Do not allow ib userspace open following device internal error

2009-08-12 Thread Jack Morgenstein
device comes back up, thus preventing the above deadlock. Signed-off-by: Jack Morgenstein --- Roland, You are right, mthca also needs such a patch. This will prevent user-level apps from allocating a device context following a device internal catastrophic error. BTW, if the administrator has d

[ofa-general] Re: [PATCH V2] mlx4: Do not allow ib userspace open while device is being removed

2009-08-12 Thread Jack Morgenstein
On Tuesday 11 August 2009 19:23, Roland Dreier wrote: > > > this is a continuation of thread: > > http://lists.openfabrics.org/pipermail/general/2009-July/060668.html > > I see you > didn't answer the question about mthca -- does it suffer from this > problem as well? > Sorry about that. Yes,

[ofa-general] [PATCH V2] mlx4: Do not allow ib userspace open while device is being removed

2009-08-11 Thread Jack Morgenstein
device comes back up, thus preventing the above deadlock. V2: move active flag from net to hw/mlx4, and use only for fatal event flow. (per feedback from Roland). Signed-off-by: Jack Morgenstein --- Roland, this is a continuation of thread: http://lists.openfabrics.org/pipermail/general/2009-Ju

Re: [ofa-general] Re: [PATCH 10/14] infiniband: use printk_once

2009-08-11 Thread Jack Morgenstein
On Monday 10 August 2009 20:42, Roland Dreier wrote: > > > I'm a bit nervous about this one. > > printk_once will print once ONLY if CONFIG_PRINTK is set in > include/linux/autoconf.h > > (i.e., when the kernel is configured). Otherwise, it gets defined to > printk -- > > and it will alwa

Re: [ofa-general] Re: [PATCH 10/14] infiniband: use printk_once

2009-08-09 Thread Jack Morgenstein
I'm a bit nervous about this one. printk_once will print once ONLY if CONFIG_PRINTK is set in include/linux/autoconf.h (i.e., when the kernel is configured). Otherwise, it gets defined to printk -- and it will always print in this case. (see 2.6.30.xx kernel include file "include/linux/kernel.h

Re: [ofa-general] OFED on Centos with 2.6.30.4 generic kernel

2009-08-04 Thread Jack Morgenstein
On Tuesday 04 August 2009 12:58, Robert Dunkley wrote: > I'm a bit of newbie to kernel building but work on my first custom > kernel seems to be going well so far. > > The issue I have is the systems this kernel is destined for are using > Mellanox infiniband cards, IPOIB (CM), RDMA and Subnet Ma

Re: [ofa-general] IPoIB post_send failed

2009-07-30 Thread Jack Morgenstein
On Wednesday 29 July 2009 21:42, Hal Rosenstock wrote: > > > I know I'm going to hear it but it's not under my control :-) > > It's whatever is in OFED 1.4.1. kernel is some 2.6.18 variant using > mlx4.v1.0 (April 4, 2008) using x86_64 arch. > Hal, 1. Did you install userspace from OFED 1.4.1,

Re: [ofa-general] OFED 1.5 alpha release is available

2009-07-28 Thread Jack Morgenstein
On Thursday 23 July 2009 21:17, Tziporet Koren wrote: > OFED 1.5-alpha4 is available > > o Linux Operating Systems: ... > - OpenSuSE 10.3:2.6.22.5-31 * Correction: OpenSuSE 11: 2.6.25.5-1.1-default * (OpenSuSE 10.3 is not supported under OFED 1.5)

Re: [ofa-general] ofa_1_5_kernel 20090723-0200 daily build status

2009-07-23 Thread Jack Morgenstein
Andy, This snippet is from the EWG list, regarding the daily build of OFED 1.5 (which is based on kernel 2.6.30). Note the failure below (when compiling on kernel 2.6.26). Please note that rds will fail in ALL backports (i.e, kernel 2.6.29 and earlier), because the 'DECLARE_PER_CPU_SHARED_ALIGNED

[ofa-general] Re: [ewg] [Patch mthca backport] Don't use kmalloc > 128k

2009-07-23 Thread Jack Morgenstein
On Thursday 16 July 2009 21:08, Doug Ledford wrote: > On rhel4 and rhel5 machines, the kmalloc implementation does not > automatically forward kmalloc requests > 128kb to __get_free_pages. > Please include this patch in all rhel4 and rhel5 backport directories > so that we do the right thing

Re: [ofa-general] Re: [PATCH ibverbs] Make the gid argument to ibv_attach_mcast and ibv_detach_mcast const

2009-07-22 Thread Jack Morgenstein
On Monday 20 July 2009 21:53, Jason Gunthorpe wrote: > I have also patches for mlx4 and mthca to suppress the compiler > warning that results from this patch. ipath is OK as is, and I'm not > sure where the iwarp stuff lives.. > Is this change really necessary? Seems to me that you are creating c

[ofa-general] Re: [PATCH V2] mlx4: check for FW version which properly supports resize_cq

2009-07-15 Thread Jack Morgenstein
On Wednesday 15 July 2009 01:33, Roland Dreier wrote: > It occurs to me that one change that makes sense and would help make > this fix cleaner is the following -- since after all if a command # is > out of range, that's really a different error than if a low-level driver > just doesn't implement a

Re: [ofa-general] Compilation errors with OFED 1.4.1/ 1.4

2009-07-14 Thread Jack Morgenstein
On Monday 13 July 2009 19:52, pandit ib wrote: > > Looks like the OFED installation is faulty. > > Can we fix this issue in the next release of OFED? This is not an OFED issue, you need to fix your compilation script. > > For some reason, your compilation script is not taking directory > > /usr/

[ofa-general] [PATCH] mlx4: Do not allow ib userspace open while device is being removed

2009-07-12 Thread Jack Morgenstein
device comes back up, thus preventing the above deadlock. Signed-off-by: Jack Morgenstein --- Roland, For good measure, I also set the active flag to false at mlx4_ib_remove() -- to give some measure of protection against opening a new userspace app while the driver is in the process of bein

[ofa-general] Re: [PATCH V2] mlx4: check for FW version which properly supports resize_cq

2009-07-09 Thread Jack Morgenstein
On Thursday 09 July 2009 18:10, Roland Dreier wrote: > Or maybe it's cleaner to add a stub resize_cq method that just returns > ENOSYS that drivers can set when they don't actually implement it... > Basically, that is what the patch I submitted to you does. Its just that instead of having a differ

[ofa-general] Re: [PATCH V2] mlx4: check for FW version which properly supports resize_cq

2009-07-09 Thread Jack Morgenstein
On Wednesday 08 July 2009 18:58, Roland Dreier wrote: > This is kind of dopey, isn't it?  Seems cleaner just to leave the > resize_cq method unset if the hardware doesn't support it; then the core > takes care of this check for us. > Not so (unfortunately). The problem is that doing it the "corre

[ofa-general] [PATCH] mlx4: print out returned raw FW command status if a non-zero status is returned

2009-07-08 Thread Jack Morgenstein
The returned FW raw command status is invaluable in troubleshooting, and if a FW command error status is returned, we need to be able to see it (along with the command which caused the non-zero status). Signed-off-by: Jack Morgenstein diff --git a/drivers/net/mlx4/cmd.c b/drivers/net/mlx4/cmd.c

[ofa-general] [PATCH V2] mlx4: check for FW version which properly supports resize_cq

2009-07-08 Thread Jack Morgenstein
If a ConnectX card has a FW version installed which does not support resize cq, the resize_cq command will return -ENOSYS. Fixes Bugzilla 1415. Signed-off-by: Jack Morgenstein --- Roland, I submitted this on 2008-12-03, and somehow it fell through the cracks. I've regenerated it for you

Re: [ofa-general] upstream stable merge request for OFED 1.4

2009-07-07 Thread Jack Morgenstein
On Tuesday 07 July 2009 15:31, Lars Ellenberg wrote: > but I was wondering about the status of the > git://git.openfabrics.org/ofed_1_4/linux-2.6.git tree, > whether that is supposed to be the "most uptodate official ofed_1_4" > kernel, and whether or not it is going to be updated to either track >

Re: [ofa-general] Write combining support in OFED 1.5

2009-07-07 Thread Jack Morgenstein
Adding Eli Cohen (author of the huge-page support patch). Eli, what is missing on the PPC regarding huge page support? -Jack On Tuesday 07 July 2009 13:54, Or Gerlitz wrote: > Jack Morgenstein wrote: > > Yes, see kernel_patches/fixes/mlx4_0010_add_wc.patch. With OFED 1.5, I am >

Re: [ofa-general] Write combining support in OFED 1.5

2009-07-07 Thread Jack Morgenstein
On Tuesday 07 July 2009 13:03, Or Gerlitz wrote: > Jack Morgenstein wrote: > > I've been looking at the write-combining support in the 2.6.30 kernel, and > > it looks good [...] from the write-combining support in OFED 1.4: > > > Hi Jack, is there some WC related

Re: [ofa-general] Compilation errors with OFED 1.4.1/ 1.4

2009-07-07 Thread Jack Morgenstein
Looks like the OFED installation is faulty. The missing function declarations are all found in header files under directory (on your system) /usr/src/ofa_kernel/kernel_addons/backport/2.6.16_sles10_sp2. These header files must be taken before the regular kernel header files (they "include_next" to

[ofa-general] Write combining support in OFED 1.5

2009-07-06 Thread Jack Morgenstein
Hi Roland, I've been looking at the write-combining support in the 2.6.30 kernel, and it looks good. There is also a good solution for PPC write combining support in the kernel (adding #define pgprot_writecombine pgprot_noncached_wc to file arch/powerpc/include/asm/pgtable.h, per e-mail corres

[ofa-general] Re: Write combining on PPC64

2009-07-04 Thread Jack Morgenstein
Resending, adding Hoang-Nam Nguyen and Christoph Raisch of IBM. Please see the questions below. Also, who is the person at IBM who does Linux kernel devleopment for the PPC? Thanks! -Jack On Sunday 28 June 2009 12:21, Jack Morgenstein wrote: > Hi Shirley, > > I was reviewing write-

Re: [ofa-general] IPoIB kernel Oops -- race condition

2009-06-30 Thread Jack Morgenstein
On Tuesday 30 June 2009 16:27, Yossi Etigin wrote: > > What do you think about renaming the mcast_task to mcast_join_task > and multicast_list to mcast_join_list? It will make the purpose and > the analogy between the two more obvious. I'll do that. > > static void __exit ipoib_cleanup_module(v

Re: [ofa-general] IPoIB kernel Oops -- race condition

2009-06-29 Thread Jack Morgenstein
o avoid race conditions (which may lead to a kernel Oops) between multicast join and multicast leave, we transfer leave processing to the workqueue (rather than do it in place). This fixes Bugzilla 1666. This fix was suggested by Yossi Etigin of Voltaire. Signed-off-by: Jack Morgenstein

Re: [ofa-general] IPoIB kernel Oops -- race condition

2009-06-29 Thread Jack Morgenstein
On Monday 29 June 2009 18:06, Moni Shoua wrote: > Jack Morgenstein wrote: > > On Sunday 28 June 2009 19:09, Moni Shoua wrote: > >> maybe synchronizing the race with a completion var (like IPoIB does in > >> struct ipoib_path) will help. I think this will work. I can sen

Re: [ofa-general] IPoIB kernel Oops -- race condition

2009-06-28 Thread Jack Morgenstein
On Monday 29 June 2009 07:14, Jack Morgenstein wrote: > > > On second thought, maybe it would be simpler to just create an > ipoib_stop_task(), > and do everything ipoib_stop() does in that workqueue task. leave would thus > always > be executed in the workqueue. >

Re: [ofa-general] IPoIB kernel Oops -- race condition

2009-06-28 Thread Jack Morgenstein
On Sunday 28 June 2009 23:04, Yossi Etigin wrote: > How about making the leave/free mcast operation take place on the > ipoib_workqueue, on which > the join operation takes place? this way we can avoid this race, and more > potential races > of this kind. > On second thought, maybe it would be s

Re: [ofa-general] IPoIB kernel Oops -- race condition

2009-06-28 Thread Jack Morgenstein
On Sunday 28 June 2009 23:04, Yossi Etigin wrote: > Jack Morgenstein wrote: > > in ipoib_mcast_leave(): > > *** NEED TO WAIT HERE BEFORE CONTINUING (so that BUSY is cleared > > (mcast->mc is in error), > > *** or BUSY flag is set and mcast-&g

Re: [ofa-general] IPoIB kernel Oops -- race condition

2009-06-28 Thread Jack Morgenstein
On Sunday 28 June 2009 19:09, Moni Shoua wrote: > maybe synchronizing the race with a completion var (like IPoIB does in struct > ipoib_path) will help. I think this will work. I can send a patch if you want > unless you see this idea doesn't work for this case. > > MoniS I just looked at the ip

Re: [ofa-general] IPoIB kernel Oops -- race condition

2009-06-28 Thread Jack Morgenstein
On Sunday 28 June 2009 19:09, Moni Shoua wrote: > maybe synchronizing the race with a completion var > (like IPoIB does in struct ipoib_path) will help. I think this will work. > I can send a patch if you want unless you see this idea doesn't work for this > case. > Please do send a patch. Tha

[ofa-general] Write combining on PPC64

2009-06-28 Thread Jack Morgenstein
Hi Shirley, I was reviewing write-combining for the PPC on kernel 2.6.30, and noticed the following in file arch/powerpc/include/asm/pgtable.h: #define _PAGE_CACHE_CTL (_PAGE_COHERENT | _PAGE_GUARDED | _PAGE_NO_CACHE | \ _PAGE_WRITETHRU) ... #define pgprot_noncached_wc(p

[ofa-general] IPoIB kernel Oops -- race condition

2009-06-28 Thread Jack Morgenstein
We have seen the following kernel Oops on IPoIB: ib0: multicast join failed for ff12:401b::::::, status -22 Unable to handle kernel paging request for data at address 0x0054 adFaulting instruction address: 0xe60b43c4 Oops: Kernel access of bad area, sig: 11 [#1] ... NIP

Re: [ofa-general] Re: [PATCH] Fixed memory leak in drivers/net/mlx4/main.c

2009-06-10 Thread Jack Morgenstein
On Monday 08 June 2009 18:08, Nicolas Morey-Chaisemartin wrote: > I'm still having difficulties to understand how mainstream code and ofed code > interacts. > The base kernel files (in this case, 2.6.30) are taken unmodified into OFED. Adjustments (patches) to the base kernel are placed in direct

[ofa-general] Re: [PATCH] Fixed memory leak in drivers/net/mlx4/main.c

2009-06-08 Thread Jack Morgenstein
OFED 1.5 is still based on 2.6.30-rc2. If this patch in is 2.6.30-rc8, we will grab it from the mainstream within the next couple of days (when we rebase to that RC). (For that reason, I'm not checking this in as a patch right now). -Jack On Monday 08 June 2009 11:52, Nicolas Morey-Chaisemartin

Re: [ofa-general] [PATCH] mlx4: fix post send of local invalidate and fast registration packets.

2009-06-07 Thread Jack Morgenstein
On Friday 05 June 2009 15:44, Tom Talpey wrote: > If you want, I'll dig up the git change. > Thanks, but no need. I know about that one. This is a different bug. -Jack ___ general mailing list general@lists.openfabrics.org http://lists.openfabrics.org

[ofa-general] Re: [PATCH] mlx4: fix post send of local invalidate and fast registration packets.

2009-06-07 Thread Jack Morgenstein
On Friday 05 June 2009 20:31, Roland Dreier wrote: > > > Link to message towards end of thread (with very specific problem > description): > > http://lists.openfabrics.org/pipermail/general/2009-April/059253.html > > > This patch fixes the problem described in the thread. > > That is very us

[ofa-general] Re: [PATCH] mlx4: fix post send of local invalidate and fast registration packets.

2009-06-05 Thread Jack Morgenstein
On Friday 05 June 2009 02:47, Roland Dreier wrote: > > Please try to get this patch into 2.6.30 -- it is an important fix for > nfsrdma. > > Would be easier to get it in if you had a pointer to the NFS/RDMA bug > report. Not sure why you think this info isn't worth including in the > changelog.

[ofa-general] [PATCH] mlx4: fix post send of local invalidate and fast registration packets.

2009-06-04 Thread Jack Morgenstein
may begin execution too early). Signed-off-by: Jack Morgenstein --- Roland, Please try to get this patch into 2.6.30 -- it is an important fix for nfsrdma. Thanks! diff --git a/drivers/infiniband/hw/mlx4/qp.c b/drivers/infiniband/hw/mlx4/qp.c index 20724ae..c4a0264 100644 --- a/drivers/infiniba

[ofa-general] [PATCH] mlx4: fix fast registration implementation

2009-05-07 Thread Jack Morgenstein
use of the crash was found by Vu Pham of Mellanox. The fix is along the lines suggested by Steve Wise in comment #21 in Bugzilla 1571. This patch fixes Bugzilla 1571. Signed-off-by: Jack Morgenstein --- Roland, please take this for kernel 2.6.30. diff --git a/drivers/infiniband/hw/mlx4/ml

Re: [ofa-general] OFED, the backported header and sg_init_table()

2009-05-05 Thread Jack Morgenstein
On Tuesday 05 May 2009 18:06, Jon Mason wrote: > No, we currently duplicate all the scatterlist functionality.  Including > ncrypto.h would greatly simplify the backport headers, but it is a > RHEL5.2/5.3 only solution.  If this change is needed for all other > backports, then a better solution wil

Re: [ofa-general] OFED, the backported header and sg_init_table()

2009-05-05 Thread Jack Morgenstein
On Monday 04 May 2009 17:56, Jon Mason wrote: > What's even worse is that sg_init_table is already defined in the > RHEL5.3 headers.  When coding up a header cleanup patch for RHEL5.3, I > noticed it was already defined in linux/ncrypto.h.  Also, it's there for > RHEL5.2 (and a few older kernels).

Re: [ofa-general] OFED, the backported header and sg_init_table()

2009-05-03 Thread Jack Morgenstein
On Saturday 02 May 2009 14:46, Bart Van Assche wrote: > Hello, > > Yesterday I installed OFED-1.4.1-rc4 on a CentOS 5.3 system and started > looking at the backported kernel headers. I found the following in the > header file > /usr/src/ofa_kernel-1.4.1/kernel_addons/backport/2.6.18-EL5.3/include/

Re: [ofa-general] OFED, the backported header and sg_init_table()

2009-05-03 Thread Jack Morgenstein
On Saturday 02 May 2009 14:46, Bart Van Assche wrote: > Does anyone know why sg_init_table() is defined such that it does nothing in > the backported OFED headers ? > My mistake while doing backports. Will be fixed in rc5. - Jack ___ general mailing lis

Re: [ofa-general] Re: How to tell what OFED rev a distro derived IB modules?

2009-04-28 Thread Jack Morgenstein
On Monday 27 April 2009 19:55, Jason Gunthorpe wrote: > OFED tests the past - back ports to old distributions and a random > non-upstream collection of patches ontop of that. That is fine for end > users, but.. > That is not quite the case. We do test regression on the base kernel of a given OFE

Re: [ofa-general] Re: How to tell what OFED rev a distro derived IB modules?

2009-04-27 Thread Jack Morgenstein
On Monday 27 April 2009 20:31, Bart Van Assche wrote: > . As an example, commit > 233e70f4228e78eb2f80dc6650f65d3ae3dbf17c was applied to Linus' tree on > October 19, 2008. I could not find any trace of this > patch in the OFED distribution -- not even in > OFED-1.4.1-20090427-0600. That is b

Re: [ofa-general] Re: How to tell what OFED rev a distro derived IB modules?

2009-04-27 Thread Jack Morgenstein
On Monday 27 April 2009 13:46, Moni Shoua wrote: > So, Is there an easy way for upstream kernel users that want user space > functionality? > Why can't they just install OFED? This affects ONLY the infiniband modules, and has undergone extensive QA on lots of platforms. - Jack

Re: [ofa-general] Re: How to tell what OFED rev a distro derived IB modules?

2009-04-26 Thread Jack Morgenstein
On Sunday 26 April 2009 15:58, Sasha Khapyorsky wrote: > On 14:31 Sun 26 Apr , Jack Morgenstein wrote: > > > > > > It should be compatible with the OFED 1.4 userspace. > > > > > Beware -- you should not use OFED userspace with a non-ofed kernel for > &

Re: [ofa-general] Re: How to tell what OFED rev a distro derived IB modules?

2009-04-26 Thread Jack Morgenstein
On Sunday 26 April 2009 21:01, Jason Gunthorpe wrote: > > In general, you should not use OFED userspace libraries with non-OFED > > kernel distributions. > > That is hugely unfriendly and not really 'the linux way'.. > > Jason > I know. I did A LOT of work to avoid incompatibilities. This part

Re: [ofa-general] Re: How to tell what OFED rev a distro derived IB modules?

2009-04-26 Thread Jack Morgenstein
On Friday 24 April 2009 02:48, Jason Gunthorpe wrote: > AFAIK, Ubuntu does not do any work on their IB drivers, so the driver > is stock 2.6.27. > > In principle OFED is supposed to start with an upstream kernel and > backport those drivers to various distributions. OFED 1.3 was using > 2.6.24, OF

Re: [ofa-general] XRC / Libibverbs

2009-04-20 Thread Jack Morgenstein
On Monday 20 April 2009 15:05, Nicolas Morey-Chaisemartin wrote: > HI, > > I was wondering why in libibverbs XRC is implemented as patches and not > directly in the code? > Are there compatibility problems? > Latests qperf can't be build even with the latest libibverbs as it requires > XRC defin

[ofa-general] [PATCH] ib_mthca: Bring INIT_HCA and other commands timeout into consistency with PRM

2009-04-19 Thread Jack Morgenstein
mands. This patch is an expansion of the INIT_HCA timeout patch submitted by A. Kepner. Signed-off-by: Jack Morgenstein Index: ofed_kernel/drivers/infiniband/hw/mthca/mthca_cmd.c === --- ofed_kernel.orig/drivers/infiniband/hw/

Re: [ofa-general] [PATCH] mthca: increase INIT_HCA timeout

2009-04-19 Thread Jack Morgenstein
On Monday 13 April 2009 21:46, akep...@sgi.com wrote: > Here's a little patch we've been carrying along for a while. > > If the num_qp module parameter is set higher than 2^19 or so, > HCA initialization times out with EBUSY, e.g.: > > ib_mthca: probe of 0031:01:00.0 failed with error -16 > > A

Re: ***SPAM*** Re: [ofa-general] [PATCH] IB/mlx4: Use pgprot_writecombine() for BlueFlame pages

2009-03-30 Thread Jack Morgenstein
On Sunday 29 March 2009 20:06, Or Gerlitz wrote: > On Sun, Mar 29, 2009 at 7:35 PM, Roland Dreier wrote: > > This should bring mainline kernel small message latency to the same > > level that OFED gets with the PAT support it hacks in. > > Interesting... so the ofed support for blue flame (we are

Re: [ofa-general] [PATCH] IB/mlx4: Use pgprot_writecombine() for BlueFlame pages

2009-03-30 Thread Jack Morgenstein
On Saturday 28 March 2009 01:15, Roland Dreier wrote: > -   vma->vm_page_prot = pgprot_noncached(vma->vm_page_prot); > +   vma->vm_page_prot = pgprot_writecombine(vma->vm_page_prot); > Roland, I notice in the 2.6.29 code that: 1. There is a function validate_pat_support()

[ofa-general] Re: Problem in IB network without Switch

2009-03-03 Thread Jack Morgenstein
You need to get in touch with Mellanox Support (FAE) at this point. - Jack On Monday 02 March 2009 17:07, lakshmana swamy wrote: > > Hi Jack, > > > Yes, I connected the cable between two ports of same HCA. Without running > opensmd. > > Now the State is " Initializing" > > I observed

[ofa-general] [PATCH] IB/sa_query: fix update_sm_ah() race condition.

2009-03-02 Thread Jack Morgenstein
. The second schedule_work operation will then find a non-null port->ah_lock, and will simply overwrite it in update_sm_ah -- resulting in an ah leak. Signed-off-by: Jack Morgenstein diff --git a/drivers/infiniband/core/sa_query.c b/drivers/infiniband/core/sa_query.c index 7863a50..1865049 1006

[ofa-general] Re: Problem in IB network without Switch

2009-03-02 Thread Jack Morgenstein
On Monday 02 March 2009 16:38, lakshmana swamy wrote: > > HI Jack > > I have updated the firmware of HCA in both the machines, but the status > remains same. > Please have a look at the following outputs. > > What may be the problem ? > Your physical connection is bad. Check your cable

Re: ***SPAM*** Re: [ofa-general] ***SPAM*** Unable to get IPoIB working

2009-03-01 Thread Jack Morgenstein
On Monday 26 January 2009 22:44, Chuck Hartley wrote: > Is there some IPoIB debug I can turn on somehow? > On each of the hosts, you can do the following: in file /etc/modprobe.conf add the following line: options ib_ipoib debug_level=15 Then, restart the infiniband driver on both hos

[ofa-general] Re: Problem in IB network without Switch

2009-02-26 Thread Jack Morgenstein
You are running VERY old firmware (from 2004), and moreover, on one host you have 3.0.0, and on the other 3.1.0. You need to upgrade your firmware. Contact your Mellanox FAE (support engineer) for instructions. - Jack > Hi Jack, > > Please find the output of ibstat on both the nodes, . > > [r

[ofa-general] Re: Problem in IB network without Switch

2009-02-26 Thread Jack Morgenstein
On Thursday 26 February 2009 12:59, lakshmana swamy wrote: Please send me the output of console command: ibstat Maybe you have old FW. - Jack > > Hi Jack and Mahesh > > ThanQ for your response. > > I have channged the HCA card as well as IB cables also..Ops no use. > > > How can I

[ofa-general] [PATCH] mlx4_core: Add device IDs for MT25458 10GigE devices

2009-02-26 Thread Jack Morgenstein
Adds device IDs for Mellanox' MT25458 ConnectX+10-GBaseT 10GigE Ethernet devices. Signed-off-by: Jack Morgenstein diff --git a/drivers/net/mlx4/main.c b/drivers/net/mlx4/main.c index 6ef2490..84db33b 100644 --- a/drivers/net/mlx4/main.c +++ b/drivers/net/mlx4/main.c @@ -1230,6 +1230,8 @@ s

Re: [ofa-general] ***SPAM*** Problem in IB network without Switch

2009-02-26 Thread Jack Morgenstein
"DOWN" means that you do not have a physical link between the ports. Check your cables -- they may be bad, or badly inserted. - Jack On Thursday 26 February 2009 08:38, lakshmana swamy wrote: > > Hi All > > I have been trying to enable the IPoIB communication between two machines. > The

Re: [ofa-general] Race condition in core/sysfs.c (kernel panic) when unloading the driver

2009-02-23 Thread Jack Morgenstein
On Monday 23 February 2009 20:31, Roland Dreier wrote: > > I'm not sure that it does. This does not make sysfs access atomic wrt > module unloading. > > I think an app can still lose it's timeslice while inside the sysfs > access, and module > > unload can still occur while the app is waiting

Re: [ofa-general] Race condition in core/sysfs.c (kernel panic) when unloading the driver

2009-02-23 Thread Jack Morgenstein
On Monday 23 February 2009 06:40, Roland Dreier wrote: > Oh I see... we leave the sysfs stuff around way too long, since we want > to use it for tracking the lifetime of our class device.  the patch > below fixes things for me here... there's still room for substantial > cleanup but I think this ge

Re: [ofa-general] build warnings on rhel4 U6

2009-02-22 Thread Jack Morgenstein
On Friday 06 February 2009 21:39, Brian J. Murrell wrote: > I get these warnings trying to build with RHEL4U6 and ofa_kernel from OFED > 1.4: > > include/linux/jbd.h:1204:1: warning: "assert_spin_locked" redefined > In file included from include/linux/wait.h:25, > from include/li

[ofa-general] [PATCH] ib_core: avoid race condition between sysfs access and low-level module unload

2009-02-22 Thread Jack Morgenstein
x. Signed-off-by: Jack Morgenstein --- Roland, I think this patch is a reasonable solution to the sysfs problem of a low-level driver module being unloaded while sysfs is being accessed for the device. ib_unregister_device() is always called before the device driver frees up its resources.

Re: [ofa-general] Race condition in core/sysfs.c (kernel panic) when unloading the driver

2009-02-22 Thread Jack Morgenstein
On Sunday 22 February 2009 09:15, Roland Dreier wrote: > > I ran on RHEL5.2 ... > > I suspect that at some point in the 2+ years since 2.6.18 more locking > was added to sysfs so that this race no longer exists. You could try > and see if my test (add a sleep to the show method and make sure you

Re: [ofa-general] Race condition in core/sysfs.c (kernel panic) when unloading the driver

2009-02-22 Thread Jack Morgenstein
On Sunday 22 February 2009 09:15, Roland Dreier wrote: > > I ran on RHEL5.2 ... > > I suspect that at some point in the 2+ years since 2.6.18 more locking > was added to sysfs so that this race no longer exists. You could try > and see if my test (add a sleep to the show method and make sure you

Re: [ofa-general] Race condition in core/sysfs.c (kernel panic) when unloading the driver

2009-02-21 Thread Jack Morgenstein
On Friday 20 February 2009 08:50, Roland Dreier wrote: > What test are you using to hit this race?  Are you using a distro kernel > with OFED? > I ran on RHEL5.2, with a ConnectX card, using the following test (source given at the end of this post): 1. Start the driver. 2. In one console window,

[ofa-general] Re: [PATCH] IPoIB: In unicast_arp, do path_free only for newly-created paths

2009-02-17 Thread Jack Morgenstein
On Wednesday 18 February 2009 00:54, Roland Dreier wrote: >  > Signed-off-by: Jack Morgenstein >  > Signed-off-by: Moni Shua > > This doesn't make any sense... Moni was not involved in sending this > patch at all, and in any case since you are sending the patch your s

[ofa-general] Race condition in core/sysfs.c (kernel panic) when unloading the driver

2009-02-17 Thread Jack Morgenstein
We have found a race condition in sysfs.c which occurs when unloading low-level modules (e.g., mlx4_ib) in the driver. Specifically: Although the kernel takes reference counts on sysfs files, it does not take such counts on modules which implement attribute reads. For example, we have: static s

[ofa-general] [PATCH] IPoIB: In unicast_arp, do path_free only for newly-created paths

2009-02-17 Thread Jack Morgenstein
not free an existing path -- just leave it in the list as-is (i.e., with its valid flag cleared). Thanks to Yossi Etigin of Voltaire for identifying the problem flow which caused the kernel crash. Signed-off-by: Jack Morgenstein Signed-off-by: Moni Shua --- Roland, I ran checkpatch.pl on this

Re: [ofa-general] Re: Kernel panic in IPoIB stability testing

2009-02-04 Thread Jack Morgenstein
On Wednesday 04 February 2009 18:16, Moni Shoua wrote: > This one looks good  to me. > Are you going to make a patch and submit it? > > I think it would be best if you run the same test on the patched IPoIB before > submission. > Do you agree? > I'll do a patch tomorrow. We'll run the test over

Re: [ofa-general] Re: Kernel panic in IPoIB stability testing

2009-02-04 Thread Jack Morgenstein
On Wednesday 04 February 2009 17:45, Moni Shoua wrote: > Besides the locking issue that I hadn't think about yet what if we this fix > looks the right thing to do. > But what if we leave the path without freeing it even if path_rec_start() > fails? > This would leave a path which is not valid in

Re: [ofa-general] Re: Kernel panic in IPoIB stability testing

2009-02-04 Thread Jack Morgenstein
On Wednesday 04 February 2009 15:33, Moni Shoua wrote: > Isn't the fix just as simple as this? > > void ipoib_mark_paths_invalid(struct net_device *dev) > { > struct ipoib_dev_priv *priv = netdev_priv(dev); > struct ipoib_path *path, *tp; > > spin_lock_irq(&priv->lock); >

[ofa-general] Re: Kernel panic in IPoIB stability testing

2009-02-03 Thread Jack Morgenstein
On Wednesday 04 February 2009 08:46, Jack Morgenstein wrote: > On Tuesday 03 February 2009 19:56, Yossi Etigin wrote: > > I think it comes from unicast_arp_send. > > Consider this scenario: > > - paths are flushed (opensm up/down). > > - unicast_arp_send() is called wit

[ofa-general] Re: Kernel panic in IPoIB stability testing

2009-02-03 Thread Jack Morgenstein
On Tuesday 03 February 2009 19:56, Yossi Etigin wrote: > I think it comes from unicast_arp_send. > Consider this scenario: > - paths are flushed (opensm up/down). > - unicast_arp_send() is called with a path in priv->path_list. path->valid is > 0. > - path_rec_start() fails with -EAGAIN (-11) beca

[ofa-general] Kernel panic in IPoIB stability testing

2009-02-03 Thread Jack Morgenstein
We saw the following kernel panic when testing ipoib stability intensively by simultaneously (i.e., in separate processes, with random wait intervals) doing: - ifconfig up/down - opensm up/down - ipoib ping - arp delete - driver up/down ib0: ib_sa_path_rec_get failed: -11 ib0: ib_sa_path_rec_get

[ofa-general] Re: IPoIB kernel Oops -- possible race condition identified.

2009-01-28 Thread Jack Morgenstein
On Wednesday 28 January 2009 20:53, Roland Dreier wrote: > > - priv->mcast_mtu = > IPOIB_UD_MTU(ib_mtu_enum_to_int(priv->broadcast->mcmember.mtu)); > > + spin_lock_irq(&priv->lock); > > + if (priv->broadcast) > > + priv->mcast_mtu = > IPOIB_UD_MTU(ib_mtu_enum_to_int(priv->broadcas

[ofa-general] Re: IPoIB kernel Oops -- possible race condition identified.

2009-01-27 Thread Jack Morgenstein
in_task. There is a race whereby the ipoib broadcast pointer may be set to NULL by flush while the join task is being started. This protects the broadcast pointer access via a spinlock. If the pointer is indeed NULL, we set the mcast_mtu value to the current admin_mtu value -- since

[ofa-general] IPoIB kernel Oops -- possible race condition identified.

2009-01-26 Thread Jack Morgenstein
The following Oops occurred several times on an X86 host when unloading the driver: (console command sequence: /etc/init.d/openibd start opensm & pkill -2 opensm /etc/init.d/openibd stop ) IP: [] :ib_ipoib:ipoib_mcast_join_ta

[ofa-general] Kernel panic in IPoIB (RHEL5.1)

2009-01-22 Thread Jack Morgenstein
We saw the following kernel panic when testing ipoib stability intensively by simultaneously (i.e., in separate processes, with random wait intervals) doing: - ifconfig up/down - opensm up/down - ipoib ping - arp delete - driver up/down Does anyone have ideas as to what might have happened? (the

[ofa-general] Re: [PATCH] mlx4_ib: fix for bugzilla 1383 (LSO packet processing)

2009-01-18 Thread Jack Morgenstein
On Friday 16 January 2009 22:10, Roland Dreier wrote: > So I'll merge the patch with the wmb() there, and you can convince me to > get rid of it later if my reasoning is wrong. > We did performance testing on your version of the patch, and my version, and there was no statistically significant

Re: [ofa-general] Re: [PATCH] mlx4_ib: fix for bugzilla 1383 (LSO packet processing)

2009-01-18 Thread Jack Morgenstein
On Friday 16 January 2009 22:02, Roland Dreier wrote: > OK, I think I'm going to merge my version of the patch. If there really > is a performance penalty I'd rather move the mlx transport stuff > out-of-line first rather than make the code too unreadble with gotos and > duplication etc. > Roland

Re: [ofa-general] iWARP: Zero STag, OFED 1.3 vs 1.4

2009-01-14 Thread Jack Morgenstein
On Wednesday 14 January 2009 12:21, Philip Frey1 wrote: > Hello, > > I recently upgraded from OFED 1.3 to 1.4 and the behaviour of an STag of > zero seems to have changed. > Did you try sending with send_wr.sg_list = NULL; send_wr.num_sge = 0; ? (if this works, it should resul

RE: [ofa-general] [PATCH] mlx4_ib: Fix dispatch of IB_EVENT_LID_CHANGE

2009-01-14 Thread Jack Morgenstein
rated, check for CLIENT REREG. Thoughts? --Original Message- > From: Moni Shoua [mailto:mo...@voltaire.com] > Sent: Wednesday, January 14, 2009 6:21 PM > To: Roland Dreier > Cc: Jack Morgenstein; Olga Stern; Yossi Etigin; OpenFabrics General > Subject: Re: [ofa-general] [PATCH] m

  1   2   3   4   5   >