Re: [OMPI users] Default value of btl_openib_memalign_threshold

2015-06-11 Thread Steve Wise
Hey Jeff, what did you run to generate the memory corruption? Can you run the same test with --mca btl_openib_memalign_threshold 12288 and see if you get the same corruption? I'm not hitting any corruption over iw_cxgb4 with a simple test. On 6/10/2015 2:39 PM, Jeff Squyres (jsquyres)

Re: [OMPI users] Default value of btl_openib_memalign_threshold

2015-06-11 Thread Steve Wise
ase 1.8.6, I recommend you revert the change that breaks things until we can figure this out. -Original Message- From: Jeff Squyres (jsquyres) [mailto:jsquy...@cisco.com] Sent: Wednesday, June 10, 2015 3:30 PM To: Open MPI User's List Cc: Nathan Hjelm; Steve Wise Subject: Re: [OMPI users] D

Re: [OMPI users] Default value of btl_openib_memalign_threshold

2015-06-10 Thread Ralph Castain
Squyres (jsquyres) [mailto:jsquy...@cisco.com] >> Sent: Wednesday, June 10, 2015 3:30 PM >> To: Open MPI User's List >> Cc: Nathan Hjelm; Steve Wise >> Subject: Re: [OMPI users] Default value of btl_openib_memalign_threshold >> >> Nathan / Steve -- you guys are n

Re: [OMPI users] Default value of btl_openib_memalign_threshold

2015-06-10 Thread Steve Wise
c: Nathan Hjelm; Steve Wise > Subject: Re: [OMPI users] Default value of btl_openib_memalign_threshold > > Nathan / Steve -- you guys are nominally the owners of the openib BTL: can > you please investigate? > > > > On Jun 10, 2015, at 4:15 PM, Ralph Castain <r...@

Re: [OMPI users] Default value of btl_openib_memalign_threshold

2015-06-10 Thread Jeff Squyres (jsquyres)
Nathan / Steve -- you guys are nominally the owners of the openib BTL: can you please investigate? > On Jun 10, 2015, at 4:15 PM, Ralph Castain wrote: > > Odd - without that setting, the value is essentially undefined, so it’s hard > to understand how that is any better.

Re: [OMPI users] Default value of btl_openib_memalign_threshold

2015-06-10 Thread Ralph Castain
Odd - without that setting, the value is essentially undefined, so it’s hard to understand how that is any better. Maybe the whole alignment thing is busted, and leaving it undefined (which usually defaults to zero, but not always) causes it to be turned “off”? I don’t really care, mind you -

Re: [OMPI users] Default value of btl_openib_memalign_threshold

2015-06-10 Thread Jeff Squyres (jsquyres)
Ralph -- This change was not correct (https://github.com/open-mpi/ompi/commit/ce915b5757d428d3e914dcef50bd4b2636561bca). It is causing memory corruption in the openib BTL. > On May 25, 2015, at 11:56 AM, Ralph Castain wrote: > > I don’t see a problem with it. FWIW: I’m

Re: [OMPI users] Default value of btl_openib_memalign_threshold

2015-05-25 Thread Ralph Castain
I don’t see a problem with it. FWIW: I’m getting ready to release 1.8.6 in the next week > On May 25, 2015, at 8:46 AM, Xavier Besseron wrote: > > Good that it will be fixed in the next release! > > In the meantime, and because it might impact other users, > I would

Re: [OMPI users] Default value of btl_openib_memalign_threshold

2015-05-25 Thread Xavier Besseron
Good that it will be fixed in the next release! In the meantime, and because it might impact other users, I would like to ask my sysadmins to set btl_openib_memalign_threshold=12288 in etc/openmpi-mca-params.conf on our clusters. Do you see any good reason not doing it? Thanks! Xavier On

Re: [OMPI users] Default value of btl_openib_memalign_threshold

2015-05-25 Thread Ralph Castain
I found the problem - someone had a typo in btl_openib_mca.c. The threshold need to be set to the module eager limit as that is the only thing defined at that point. Thanks for bringing it to our attention! I’ll set it up to go into 1.8.6 > On May 25, 2015, at 3:04 AM, Xavier Besseron

Re: [OMPI users] Default value of btl_openib_memalign_threshold

2015-05-25 Thread Xavier Besseron
Hi, Thanks for your reply Ralph. The option only I'm using when configuring OpenMPI is '--prefix'. When checking the config.log file, I see configure:208504: checking whether the openib BTL will use malloc hooks configure:208510: result: yes so I guess it is properly enabled (full config.log

Re: [OMPI users] Default value of btl_openib_memalign_threshold

2015-05-24 Thread Ralph Castain
Looking at the code, we do in fact set the memalign_threshold = eager_limit by default, but only if you configured with —enable-btl-openib-malloc-alignment AND/OR we found the malloc hook functions were available. You might check config.log to see if the openib malloc hooks were enabled. My