Re: [ewg] OFED 1.4.1 RC4 is delayed to Thursday
Rupert Dance wrote: Hi Tziporet, Thanks for the update. Can we also get bug 1287 https://bugs.openfabrics.org/show_bug.cgi?id=1287 resolved? The OFA Interop Event has begun and this causes a failure that will have to be reported as part of the results on the OFA Logo site. We have both vendors and end users who seem to be concerned about IPoIB dropping packets and I think we should address it – it is a known problem discovered approximately one year ago. Hi Rupert, We have discussed this bug on Apr-6 meeting and I said that Mellanox will not be able to assign anyone to debug this now. Qlogic said they will see if they can assign someone. I cc John Russo from Qlogic - maybe they can help Tziporet ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
Re: [ewg] OFED 1.4.1 RC4 is delayed to Thursday
Davis, Arlin R wrote: I am running unit tests with the dapl fix (#1613) now and can have a new package later tonight. No need to delay for this bug. We are building rc4 now If we need we will do rc5 next week too Lets close this in our Monday meeting Tziporet ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
Re: [ewg] OFED 1.4.1 RC4 is delayed to Thursday
Comment #8 already identified the problem and a solution. I think we would need to change the default value for net.ipv4.neigh.ib0.unres_qlen to make it permanent. On Thu, 2009-04-30 at 04:12 -0700, Tziporet Koren wrote: Rupert Dance wrote: Hi Tziporet, Thanks for the update. Can we also get bug 1287 https://bugs.openfabrics.org/show_bug.cgi?id=1287 resolved? The OFA Interop Event has begun and this causes a failure that will have to be reported as part of the results on the OFA Logo site. We have both vendors and end users who seem to be concerned about IPoIB dropping packets and I think we should address it – it is a known problem discovered approximately one year ago. Hi Rupert, We have discussed this bug on Apr-6 meeting and I said that Mellanox will not be able to assign anyone to debug this now. Qlogic said they will see if they can assign someone. I cc John Russo from Qlogic - maybe they can help Tziporet ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
Re: [ewg] OFED 1.4.1 RC4 is delayed to Thursday
Hi Tziporet. My update is that I believe I know what is causing bug 1607, and I'm working on a fix. Thanks. -jeff Steve Wise wrote: Status update: I cleaned up some NFSRDMA server crashes that happen when there are asynchronous WR failures. That might help Vu figure out 1571. I think there is a FW issue causing the async failure. But the code shouldn't crash anymore with my latest fix. But I'd also like 1613, 1616 into ofed-1.4.1: 1613: dapl regression that UNH uncovered. Arlin has a fix. 1616: nfsrdma ppc64 issue uncovered today. Hopefully we can nail this one by EOB friday Should we crank RC4 tomorrow and plan an RC5? Or hold off for a few more days with RC4? Steve. Tziporet Koren wrote: Jon Mason wrote: On Mon, Apr 27, 2009 at 05:43:05PM +0300, Tziporet Koren wrote: Hi All Since there are still few open critical bugs we delay OFED 1.4.1-RC4 build to Thursday. Note that we are on vacation on Wed this week (Israel Independence Day) The bugs that must be fixed: 1607blo SLES jeffrey.c.bec...@nasa.gov kernel oops during login on sles10 sp2 with OFED-1.4.1-20... 1609 cri RHEL sw...@opengridcomputing.com kernel panic running iozone on x86 system This was fixed by the patch Steve pushed on Friday. I'll close the bug for him. Well - its too late now for us to build and test it What about bug 1571 ? Tziporet ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
RE: [ewg] OFED 1.4.1 RC4 is delayed to Thursday
I am running unit tests with the dapl fix (#1613) now and can have a new package later tonight. No need to delay for this bug. -arlin -Original Message- From: Steve Wise [mailto:sw...@opengridcomputing.com] Sent: Wednesday, April 29, 2009 2:46 PM To: Tziporet Koren Cc: Jon Mason; Vu Pham; ewg@lists.openfabrics.org; Davis, Arlin R; Vladimir Sokolovsky Subject: Re: [ewg] OFED 1.4.1 RC4 is delayed to Thursday Status update: I cleaned up some NFSRDMA server crashes that happen when there are asynchronous WR failures. That might help Vu figure out 1571. I think there is a FW issue causing the async failure. But the code shouldn't crash anymore with my latest fix. But I'd also like 1613, 1616 into ofed-1.4.1: 1613: dapl regression that UNH uncovered. Arlin has a fix. 1616: nfsrdma ppc64 issue uncovered today. Hopefully we can nail this one by EOB friday Should we crank RC4 tomorrow and plan an RC5? Or hold off for a few more days with RC4? Steve. Tziporet Koren wrote: Jon Mason wrote: On Mon, Apr 27, 2009 at 05:43:05PM +0300, Tziporet Koren wrote: Hi All Since there are still few open critical bugs we delay OFED 1.4.1-RC4 build to Thursday. Note that we are on vacation on Wed this week (Israel Independence Day) The bugs that must be fixed: 1607blo SLES jeffrey.c.bec...@nasa.gov kernel oops during login on sles10 sp2 with OFED-1.4.1-20... 1609 cri RHEL sw...@opengridcomputing.com kernel panic running iozone on x86 system This was fixed by the patch Steve pushed on Friday. I'll close the bug for him. Well - its too late now for us to build and test it What about bug 1571 ? Tziporet ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
[ewg] OFED 1.4.1 RC4 is delayed to Thursday
Hi All Since there are still few open critical bugs we delay OFED 1.4.1-RC4 build to Thursday. Note that we are on vacation on Wed this week (Israel Independence Day) The bugs that must be fixed: 1607blo SLESjeffrey.c.bec...@nasa.gov kernel oops during login on sles10 sp2 with OFED-1.4.1-20... 1609cri RHELsw...@opengridcomputing.com kernel panic running iozone on x86 system 1571cri RHELv...@mellanox.com nfsrdma server crash @test5 connectathon basic test, Steve, Vu and Jeff - I hope you will be able to fix these bugs by end of Wed so we can do the RC4 build on Thu. morning (Israel time) Thanks, Tziporet ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
Re: [ewg] OFED 1.4.1 RC4 is delayed to Thursday
On Mon, Apr 27, 2009 at 05:43:05PM +0300, Tziporet Koren wrote: Hi All Since there are still few open critical bugs we delay OFED 1.4.1-RC4 build to Thursday. Note that we are on vacation on Wed this week (Israel Independence Day) The bugs that must be fixed: 1607blo SLESjeffrey.c.bec...@nasa.gov kernel oops during login on sles10 sp2 with OFED-1.4.1-20... 1609 cri RHELsw...@opengridcomputing.com kernel panic running iozone on x86 system This was fixed by the patch Steve pushed on Friday. I'll close the bug for him. Thanks, Jon 1571 cri RHELv...@mellanox.com nfsrdma server crash @test5 connectathon basic test, Steve, Vu and Jeff - I hope you will be able to fix these bugs by end of Wed so we can do the RC4 build on Thu. morning (Israel time) Thanks, Tziporet ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
Re: [ewg] OFED 1.4.1 RC4 is delayed to Thursday
Jon Mason wrote: On Mon, Apr 27, 2009 at 05:43:05PM +0300, Tziporet Koren wrote: Hi All Since there are still few open critical bugs we delay OFED 1.4.1-RC4 build to Thursday. Note that we are on vacation on Wed this week (Israel Independence Day) The bugs that must be fixed: 1607blo SLESjeffrey.c.bec...@nasa.gov kernel oops during login on sles10 sp2 with OFED-1.4.1-20... 1609cri RHELsw...@opengridcomputing.com kernel panic running iozone on x86 system This was fixed by the patch Steve pushed on Friday. I'll close the bug for him. Well - its too late now for us to build and test it What about bug 1571 ? Tziporet ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
RE: [ewg] OFED 1.4.1 RC4 is delayed to Thursday
Hi Tziporet, Thanks for the update. Can we also get bug 1287 https://bugs.openfabrics.org/show_bug.cgi?id=1287 resolved? The OFA Interop Event has begun and this causes a failure that will have to be reported as part of the results on the OFA Logo site. We have both vendors and end users who seem to be concerned about IPoIB dropping packets and I think we should address it - it is a known problem discovered approximately one year ago. Thanks Rupert From: ewg-boun...@lists.openfabrics.org [mailto:ewg-boun...@lists.openfabrics.org] On Behalf Of Tziporet Koren Sent: Monday, April 27, 2009 10:43 AM To: ewg@lists.openfabrics.org Cc: Vu Pham Subject: [ewg] OFED 1.4.1 RC4 is delayed to Thursday Hi All Since there are still few open critical bugs we delay OFED 1.4.1-RC4 build to Thursday. Note that we are on vacation on Wed this week (Israel Independence Day) The bugs that must be fixed: 1607blo SLESjeffrey.c.bec...@nasa.gov kernel oops during login on sles10 sp2 with OFED-1.4.1-20... 1609cri RHELsw...@opengridcomputing.com kernel panic running iozone on x86 system 1571cri RHELv...@mellanox.com nfsrdma server crash @test5 connectathon basic test, Steve, Vu and Jeff - I hope you will be able to fix these bugs by end of Wed so we can do the RC4 build on Thu. morning (Israel time) Thanks, Tziporet ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg