Re: [OMPI devel] [OMPI svn] svn:open-mpi r26077 (fwd)

2012-03-02 Thread Hjelm, Nathan T
From: devel-boun...@open-mpi.org [devel-boun...@open-mpi.org] on behalf of Christopher Samuel [sam...@unimelb.edu.au] Sent: Thursday, March 01, 2012 7:58 PM To: de...@open-mpi.org Subject: Re: [OMPI devel] [OMPI svn] svn:open-mpi r26077 (fwd) -BEGIN PGP SIGNED MESSAGE- Hash: SHA1

Re: [OMPI devel] [OMPI svn] svn:open-mpi r26077 (fwd)

2012-03-01 Thread Christopher Samuel
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 02/03/12 02:56, Nathan Hjelm wrote: > Found a pretty nasty frag leak (and a minor one) in ob1 (see > commit below). If this fix addresses some hangs we are seeing on > infiniband LANL might want a 1.4.6 rolled (or a faster rollout for > 1.6.0). Wh

Re: [OMPI devel] [OMPI svn] svn:open-mpi r26077 (fwd)

2012-03-01 Thread Nathan Hjelm
I can confirm that neither leak is causing my imb hang. Unless there is another frag leak somewhere (haven't found one) the lockup was simply due to running out of registered memory. So, I see no need to push for a 1.4.6 unless a btl other than ugni hits the bug. Setting an rcache limit doesn'

Re: [OMPI devel] [OMPI svn] svn:open-mpi r26077 (fwd)

2012-03-01 Thread George Bosilca
Good catch!!! That's indeed a quite nasty bug. If it fixes the IB issues it justifies a 1.4.6 release. Thanks, george. On Mar 1, 2012, at 10:56 , Nathan Hjelm wrote: > Found a pretty nasty frag leak (and a minor one) in ob1 (see commit below). > If this fix addresses some hangs we are se

Re: [OMPI devel] [OMPI svn] svn:open-mpi r26077 (fwd)

2012-03-01 Thread Nathan Hjelm
On Thu, 1 Mar 2012, Jeffrey Squyres wrote: ...or in 1.5.5. Well, we want a "stable" release to deploy on the affected cluster. How soon will you be able to tell if it fixes some hangs? I will know in a couple of hours. Tested the fix in 1.4.5 and it appears to elimiate my IMB hang! I stil

Re: [OMPI devel] [OMPI svn] svn:open-mpi r26077 (fwd)

2012-03-01 Thread Gutierrez, Samuel K
Hopefully by the end of the day - Nathan is testing now. Sam On Mar 1, 2012, at 11:36 AM, Jeffrey Squyres wrote: > ...or in 1.5.5. > > How soon will you be able to tell if it fixes some hangs? > > > On Mar 1, 2012, at 10:56 AM, Nathan Hjelm wrote: > >> Found a pretty nasty frag leak (and a

Re: [OMPI devel] [OMPI svn] svn:open-mpi r26077 (fwd)

2012-03-01 Thread Jeffrey Squyres
...or in 1.5.5. How soon will you be able to tell if it fixes some hangs? On Mar 1, 2012, at 10:56 AM, Nathan Hjelm wrote: > Found a pretty nasty frag leak (and a minor one) in ob1 (see commit below). > If this fix addresses some hangs we are seeing on infiniband LANL might want > a 1.4.6 r