[ewg] Re: [ofa-general] Re: dapl attribute bug

2009-04-29 Thread Steve Wise
Bug 1613 opened to track this.  


I think we need this for ofed-1.4.1.

Steve.


Steve Wise wrote:

Hey Arlin,

Did this ever get fixed?

I think UNH is seeing this issue still.



Steve Wise wrote:

Davis, Arlin R wrote:
 

 
The DAPL dat_ia_attr-max_lmr_block_size is a u32, yet the dapl 
code maps this to the linux ib_device_attr-max_mr_size which is u64.


This causes dapltest to fail in some cases when running over 
chelsio which sets max_mr_size to 0x1 (4GB).  The dapl code 
truncates the value to 0. See dapl/openib_cma/dapl_ib_util.c.


I'm not sure what the fix should be, but maybe the dapl code should 
set anything over 32 bits to 0x?





This attribute changed with DAT 2.0 to match the 32-bit ibv_sge
length field. Since there are no direct max lmr segments mappings
I will need add some checks when setting max_lmr_block_size from
max_mr_size. Thanks.

-arlin


I'll test your fix when its ready.  Lemme know.


Steve.

___
general mailing list
gene...@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit 
http://openib.org/mailman/listinfo/openib-general


___
general mailing list
gene...@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit 
http://openib.org/mailman/listinfo/openib-general


___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


[ewg] [GIT PULL ofed-1.4.1] iw_cxgb3/nfsrdma fixes

2009-04-29 Thread Steve Wise

Vlad,

Please pull from:

git://v...@sofa.openfabrics.org/~swise/scm/ofed-1.4.git ofed_1_4

You'll get these fixes from Jon and I:


Author: Jon Mason j...@opengridcomputing.com
Date:   Wed Apr 29 16:03:12 2009 -0500

   NFS-RDMA: DMA direction error on NFS server
  
   This patch fixes an issue I am seeing on ppc64 when running on that

   platform as a NFS Server.  The incorrect DMA direction causes an EEH
   event.
  
   This patch has already been sent upstream for inclusion into 2.6.30.
  
   Signed-Off-By: Jon Mason j...@opengridcomputing.com


commit 1f3248b3942427c437db26fec8297c754f085494
Author: Steve Wise sw...@opengridcomputing.com
Date:   Wed Apr 29 16:00:43 2009 -0500

   RDMA/cxgb3: Pull in sq flush fix.
  
   Signed-off-by: Steve Wise sw...@opengridcomputing.com


commit fde3500748351e0b431ebd667f03a6d95c045333
Author: Steve Wise sw...@opengridcomputing.com
Date:   Wed Apr 29 16:00:38 2009 -0500

   NFSRDMA: Pull in error paths fix.
  
   Signed-off-by: Steve Wise sw...@opengridcomputing.com


commit f3a84550b84aa8262821b2114ad353a2b144668c
Author: Steve Wise sw...@opengridcomputing.com
Date:   Sun Apr 26 13:44:59 2009 -0500

   NFSRDMA: pull in frmr iova_start truncation fix.
  
   Signed-off-by: Steve Wise sw...@opengridcomputing.com



___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


Re: [ewg] OFED 1.4.1 RC4 is delayed to Thursday

2009-04-29 Thread Jeff Becker
Hi Tziporet. My update is that I believe I know what is causing bug
1607, and I'm working on a fix. Thanks.

-jeff

Steve Wise wrote:
 Status update:

 I cleaned up some NFSRDMA server crashes that happen when there are 
 asynchronous WR failures.  That might help Vu figure out 1571.  I think 
 there is a FW issue causing the async failure.  But the code shouldn't 
 crash anymore with my latest fix.

 But I'd also like 1613, 1616 into ofed-1.4.1:

 1613: dapl regression that UNH uncovered.  Arlin has a fix.
 1616: nfsrdma ppc64 issue uncovered today.  Hopefully we can nail this 
 one by EOB friday

 Should we crank RC4 tomorrow and plan an RC5?  Or hold off for a few 
 more days with RC4? 


 Steve.



 Tziporet Koren wrote:
   
 Jon Mason wrote:
 
 On Mon, Apr 27, 2009 at 05:43:05PM +0300, Tziporet Koren wrote:
  
   
 Hi All
 Since there are still few open critical bugs we delay OFED 1.4.1-RC4
 build to Thursday.
 Note that we are on vacation on Wed this week (Israel Independence Day)

 The bugs that must be fixed:
 1607blo  SLES  jeffrey.c.bec...@nasa.gov  kernel oops
 during login on sles10 sp2 with OFED-1.4.1-20...
 1609 cri RHEL sw...@opengridcomputing.com kernel panic
 running iozone on x86 system
 
 
 This was fixed by the patch Steve pushed on Friday.  I'll close the bug
 for him.

   
   
 Well - its too late now for us to build and test it
 What about bug 1571 ?

 Tziporet
 ___
 ewg mailing list
 ewg@lists.openfabrics.org
 http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
 

 ___
 ewg mailing list
 ewg@lists.openfabrics.org
 http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
   

___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


RE: [ewg] OFED 1.4.1 RC4 is delayed to Thursday

2009-04-29 Thread Davis, Arlin R

I am running unit tests with the dapl fix (#1613) now and can 
have a new package later tonight. No need to delay for this bug.

-arlin

-Original Message-
From: Steve Wise [mailto:sw...@opengridcomputing.com] 
Sent: Wednesday, April 29, 2009 2:46 PM
To: Tziporet Koren
Cc: Jon Mason; Vu Pham; ewg@lists.openfabrics.org; Davis, 
Arlin R; Vladimir Sokolovsky
Subject: Re: [ewg] OFED 1.4.1 RC4 is delayed to Thursday

Status update:

I cleaned up some NFSRDMA server crashes that happen when there are 
asynchronous WR failures.  That might help Vu figure out 1571. 
 I think 
there is a FW issue causing the async failure.  But the code shouldn't 
crash anymore with my latest fix.

But I'd also like 1613, 1616 into ofed-1.4.1:

1613: dapl regression that UNH uncovered.  Arlin has a fix.
1616: nfsrdma ppc64 issue uncovered today.  Hopefully we can nail this 
one by EOB friday

Should we crank RC4 tomorrow and plan an RC5?  Or hold off for a few 
more days with RC4? 


Steve.



Tziporet Koren wrote:
 Jon Mason wrote:
 On Mon, Apr 27, 2009 at 05:43:05PM +0300, Tziporet Koren wrote:
  
 Hi All
 Since there are still few open critical bugs we delay OFED 
1.4.1-RC4
 build to Thursday.
 Note that we are on vacation on Wed this week (Israel 
Independence Day)

 The bugs that must be fixed:
 1607blo  SLES  jeffrey.c.bec...@nasa.gov  
kernel oops
 during login on sles10 sp2 with OFED-1.4.1-20...
 1609 cri RHEL sw...@opengridcomputing.com 
kernel panic
 running iozone on x86 system
 

 This was fixed by the patch Steve pushed on Friday.  I'll 
close the bug
 for him.

   
 Well - its too late now for us to build and test it
 What about bug 1571 ?

 Tziporet
 ___
 ewg mailing list
 ewg@lists.openfabrics.org
 http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg

___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


[ewg] RE: Do you still need the local SA in OFED 1.5?

2009-04-29 Thread Sean Hefty
Subject: Do you still need the local SA in OFED 1.5?

The RDMA/IB CMs do not scale without PR caching or hard-coding PR parameters.

I'm personally fine removing it from OFED.  MPI and other applications are
working around SA scaling issues by connecting over sockets anyway. 

___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg