Re: [ewg] OFED 1.4.1 RC4 is delayed to Thursday

2009-04-30 Thread Tziporet Koren

Rupert Dance wrote:


Hi Tziporet,

Thanks for the update. Can we also get bug 1287 
https://bugs.openfabrics.org/show_bug.cgi?id=1287 resolved? The OFA 
Interop Event has begun and this causes a failure that will have to be 
reported as part of the results on the OFA Logo site. We have both 
vendors and end users who seem to be concerned about IPoIB dropping 
packets and I think we should address it – it is a known problem 
discovered approximately one year ago.



Hi Rupert,
We have discussed this bug on Apr-6 meeting and I said that Mellanox 
will not be able to assign anyone to debug this now.

Qlogic said they will see if they can assign someone.
I cc John Russo from Qlogic - maybe they can help

Tziporet




___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


Re: [ewg] OFED 1.4.1 RC4 is delayed to Thursday

2009-04-30 Thread Tziporet Koren

Davis, Arlin R wrote:
I am running unit tests with the dapl fix (#1613) now and can 
have a new package later tonight. No need to delay for this bug.


  

We are building rc4 now
If we need we will do rc5 next week too
Lets close this in our Monday meeting

Tziporet

___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


Re: [ewg] OFED 1.4.1 RC4 is delayed to Thursday

2009-04-30 Thread Ralph Campbell
Comment #8 already identified the problem and a solution.
I think we would need to change the default value for
net.ipv4.neigh.ib0.unres_qlen to make it permanent.

On Thu, 2009-04-30 at 04:12 -0700, Tziporet Koren wrote:
 Rupert Dance wrote:
 
  Hi Tziporet,
 
  Thanks for the update. Can we also get bug 1287 
  https://bugs.openfabrics.org/show_bug.cgi?id=1287 resolved? The OFA 
  Interop Event has begun and this causes a failure that will have to be 
  reported as part of the results on the OFA Logo site. We have both 
  vendors and end users who seem to be concerned about IPoIB dropping 
  packets and I think we should address it – it is a known problem 
  discovered approximately one year ago.
 
 Hi Rupert,
 We have discussed this bug on Apr-6 meeting and I said that Mellanox 
 will not be able to assign anyone to debug this now.
 Qlogic said they will see if they can assign someone.
 I cc John Russo from Qlogic - maybe they can help
 
 Tziporet
 
 
 
 
 ___
 ewg mailing list
 ewg@lists.openfabrics.org
 http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg

___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


Re: [ewg] OFED 1.4.1 RC4 is delayed to Thursday

2009-04-29 Thread Jeff Becker
Hi Tziporet. My update is that I believe I know what is causing bug
1607, and I'm working on a fix. Thanks.

-jeff

Steve Wise wrote:
 Status update:

 I cleaned up some NFSRDMA server crashes that happen when there are 
 asynchronous WR failures.  That might help Vu figure out 1571.  I think 
 there is a FW issue causing the async failure.  But the code shouldn't 
 crash anymore with my latest fix.

 But I'd also like 1613, 1616 into ofed-1.4.1:

 1613: dapl regression that UNH uncovered.  Arlin has a fix.
 1616: nfsrdma ppc64 issue uncovered today.  Hopefully we can nail this 
 one by EOB friday

 Should we crank RC4 tomorrow and plan an RC5?  Or hold off for a few 
 more days with RC4? 


 Steve.



 Tziporet Koren wrote:
   
 Jon Mason wrote:
 
 On Mon, Apr 27, 2009 at 05:43:05PM +0300, Tziporet Koren wrote:
  
   
 Hi All
 Since there are still few open critical bugs we delay OFED 1.4.1-RC4
 build to Thursday.
 Note that we are on vacation on Wed this week (Israel Independence Day)

 The bugs that must be fixed:
 1607blo  SLES  jeffrey.c.bec...@nasa.gov  kernel oops
 during login on sles10 sp2 with OFED-1.4.1-20...
 1609 cri RHEL sw...@opengridcomputing.com kernel panic
 running iozone on x86 system
 
 
 This was fixed by the patch Steve pushed on Friday.  I'll close the bug
 for him.

   
   
 Well - its too late now for us to build and test it
 What about bug 1571 ?

 Tziporet
 ___
 ewg mailing list
 ewg@lists.openfabrics.org
 http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
 

 ___
 ewg mailing list
 ewg@lists.openfabrics.org
 http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
   

___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


RE: [ewg] OFED 1.4.1 RC4 is delayed to Thursday

2009-04-29 Thread Davis, Arlin R

I am running unit tests with the dapl fix (#1613) now and can 
have a new package later tonight. No need to delay for this bug.

-arlin

-Original Message-
From: Steve Wise [mailto:sw...@opengridcomputing.com] 
Sent: Wednesday, April 29, 2009 2:46 PM
To: Tziporet Koren
Cc: Jon Mason; Vu Pham; ewg@lists.openfabrics.org; Davis, 
Arlin R; Vladimir Sokolovsky
Subject: Re: [ewg] OFED 1.4.1 RC4 is delayed to Thursday

Status update:

I cleaned up some NFSRDMA server crashes that happen when there are 
asynchronous WR failures.  That might help Vu figure out 1571. 
 I think 
there is a FW issue causing the async failure.  But the code shouldn't 
crash anymore with my latest fix.

But I'd also like 1613, 1616 into ofed-1.4.1:

1613: dapl regression that UNH uncovered.  Arlin has a fix.
1616: nfsrdma ppc64 issue uncovered today.  Hopefully we can nail this 
one by EOB friday

Should we crank RC4 tomorrow and plan an RC5?  Or hold off for a few 
more days with RC4? 


Steve.



Tziporet Koren wrote:
 Jon Mason wrote:
 On Mon, Apr 27, 2009 at 05:43:05PM +0300, Tziporet Koren wrote:
  
 Hi All
 Since there are still few open critical bugs we delay OFED 
1.4.1-RC4
 build to Thursday.
 Note that we are on vacation on Wed this week (Israel 
Independence Day)

 The bugs that must be fixed:
 1607blo  SLES  jeffrey.c.bec...@nasa.gov  
kernel oops
 during login on sles10 sp2 with OFED-1.4.1-20...
 1609 cri RHEL sw...@opengridcomputing.com 
kernel panic
 running iozone on x86 system
 

 This was fixed by the patch Steve pushed on Friday.  I'll 
close the bug
 for him.

   
 Well - its too late now for us to build and test it
 What about bug 1571 ?

 Tziporet
 ___
 ewg mailing list
 ewg@lists.openfabrics.org
 http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg

___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


[ewg] OFED 1.4.1 RC4 is delayed to Thursday

2009-04-27 Thread Tziporet Koren
Hi All
Since there are still few open critical bugs we delay OFED 1.4.1-RC4
build to Thursday.
Note that we are on vacation on Wed this week (Israel Independence Day)

The bugs that must be fixed:
1607blo SLESjeffrey.c.bec...@nasa.gov   kernel oops
during login on sles10 sp2 with OFED-1.4.1-20...
1609cri RHELsw...@opengridcomputing.com kernel panic
running iozone on x86 system
1571cri RHELv...@mellanox.com   nfsrdma server
crash @test5 connectathon basic test,

Steve, Vu and Jeff - I hope you will be able to fix these bugs by end of
Wed so we can do the RC4 build on Thu. morning (Israel time)

Thanks,
Tziporet
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg

Re: [ewg] OFED 1.4.1 RC4 is delayed to Thursday

2009-04-27 Thread Jon Mason
On Mon, Apr 27, 2009 at 05:43:05PM +0300, Tziporet Koren wrote:
 Hi All
 Since there are still few open critical bugs we delay OFED 1.4.1-RC4
 build to Thursday.
 Note that we are on vacation on Wed this week (Israel Independence Day)
 
 The bugs that must be fixed:
 1607blo   SLESjeffrey.c.bec...@nasa.gov   kernel oops
 during login on sles10 sp2 with OFED-1.4.1-20...
 1609  cri RHELsw...@opengridcomputing.com kernel panic
 running iozone on x86 system

This was fixed by the patch Steve pushed on Friday.  I'll close the bug
for him.

Thanks,
Jon


 1571  cri RHELv...@mellanox.com   nfsrdma server
 crash @test5 connectathon basic test,
 
 Steve, Vu and Jeff - I hope you will be able to fix these bugs by end of
 Wed so we can do the RC4 build on Thu. morning (Israel time)
 
 Thanks,
 Tziporet

 ___
 ewg mailing list
 ewg@lists.openfabrics.org
 http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


Re: [ewg] OFED 1.4.1 RC4 is delayed to Thursday

2009-04-27 Thread Tziporet Koren

Jon Mason wrote:

On Mon, Apr 27, 2009 at 05:43:05PM +0300, Tziporet Koren wrote:
  

Hi All
Since there are still few open critical bugs we delay OFED 1.4.1-RC4
build to Thursday.
Note that we are on vacation on Wed this week (Israel Independence Day)

The bugs that must be fixed:
1607blo SLESjeffrey.c.bec...@nasa.gov   kernel oops
during login on sles10 sp2 with OFED-1.4.1-20...
1609cri RHELsw...@opengridcomputing.com kernel panic
running iozone on x86 system



This was fixed by the patch Steve pushed on Friday.  I'll close the bug
for him.

  

Well - its too late now for us to build and test it
What about bug 1571 ?

Tziporet
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


RE: [ewg] OFED 1.4.1 RC4 is delayed to Thursday

2009-04-27 Thread Rupert Dance
Hi Tziporet,

 

Thanks for the update. Can we also get bug 1287
https://bugs.openfabrics.org/show_bug.cgi?id=1287  resolved? The OFA
Interop Event has begun and this causes a failure that will have to be
reported as part of the results on the OFA Logo site. We have both vendors
and end users who seem to be concerned about IPoIB dropping packets and I
think we should address it - it is a known problem discovered approximately
one year ago.

 

Thanks

 

Rupert

 

From: ewg-boun...@lists.openfabrics.org
[mailto:ewg-boun...@lists.openfabrics.org] On Behalf Of Tziporet Koren
Sent: Monday, April 27, 2009 10:43 AM
To: ewg@lists.openfabrics.org
Cc: Vu Pham
Subject: [ewg] OFED 1.4.1 RC4 is delayed to Thursday

 

Hi All

Since there are still few open critical bugs we delay OFED 1.4.1-RC4 build
to Thursday.

Note that we are on vacation on Wed this week (Israel Independence Day)

The bugs that must be fixed:

1607blo SLESjeffrey.c.bec...@nasa.gov   kernel oops during
login on sles10 sp2 with OFED-1.4.1-20...

1609cri RHELsw...@opengridcomputing.com kernel panic running
iozone on x86 system

1571cri RHELv...@mellanox.com nfsrdma server crash
@test5 connectathon basic test,

Steve, Vu and Jeff - I hope you will be able to fix these bugs by end of Wed
so we can do the RC4 build on Thu. morning (Israel time)

Thanks,

Tziporet

___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg