Hey Jeff, what did you run to generate the memory corruption? Can you
run the same test with --mca btl_openib_memalign_threshold 12288 and see
if you get the same corruption? I'm not hitting any corruption over
iw_cxgb4 with a simple test.
On 6/10/2015 2:39 PM, Jeff Squyres (jsquyres)
ase 1.8.6, I recommend you revert the change
that breaks things until we can figure this out.
-Original Message-
From: Jeff Squyres (jsquyres) [mailto:jsquy...@cisco.com]
Sent: Wednesday, June 10, 2015 3:30 PM
To: Open MPI User's List
Cc: Nathan Hjelm; Steve Wise
Subject: Re: [OMPI users] D
Squyres (jsquyres) [mailto:jsquy...@cisco.com]
>> Sent: Wednesday, June 10, 2015 3:30 PM
>> To: Open MPI User's List
>> Cc: Nathan Hjelm; Steve Wise
>> Subject: Re: [OMPI users] Default value of btl_openib_memalign_threshold
>>
>> Nathan / Steve -- you guys are n
c: Nathan Hjelm; Steve Wise
> Subject: Re: [OMPI users] Default value of btl_openib_memalign_threshold
>
> Nathan / Steve -- you guys are nominally the owners of the openib BTL: can
> you please investigate?
>
>
> > On Jun 10, 2015, at 4:15 PM, Ralph Castain <r...@
Nathan / Steve -- you guys are nominally the owners of the openib BTL: can you
please investigate?
> On Jun 10, 2015, at 4:15 PM, Ralph Castain wrote:
>
> Odd - without that setting, the value is essentially undefined, so it’s hard
> to understand how that is any better.
Odd - without that setting, the value is essentially undefined, so it’s hard to
understand how that is any better. Maybe the whole alignment thing is busted,
and leaving it undefined (which usually defaults to zero, but not always)
causes it to be turned “off”?
I don’t really care, mind you -
Ralph --
This change was not correct
(https://github.com/open-mpi/ompi/commit/ce915b5757d428d3e914dcef50bd4b2636561bca).
It is causing memory corruption in the openib BTL.
> On May 25, 2015, at 11:56 AM, Ralph Castain wrote:
>
> I don’t see a problem with it. FWIW: I’m
I don’t see a problem with it. FWIW: I’m getting ready to release 1.8.6 in the
next week
> On May 25, 2015, at 8:46 AM, Xavier Besseron wrote:
>
> Good that it will be fixed in the next release!
>
> In the meantime, and because it might impact other users,
> I would
Good that it will be fixed in the next release!
In the meantime, and because it might impact other users,
I would like to ask my sysadmins to set btl_openib_memalign_threshold=12288
in etc/openmpi-mca-params.conf on our clusters.
Do you see any good reason not doing it?
Thanks!
Xavier
On
I found the problem - someone had a typo in btl_openib_mca.c. The threshold
need to be set to the module eager limit as that is the only thing defined at
that point.
Thanks for bringing it to our attention! I’ll set it up to go into 1.8.6
> On May 25, 2015, at 3:04 AM, Xavier Besseron
Hi,
Thanks for your reply Ralph.
The option only I'm using when configuring OpenMPI is '--prefix'.
When checking the config.log file, I see
configure:208504: checking whether the openib BTL will use malloc hooks
configure:208510: result: yes
so I guess it is properly enabled (full config.log
Looking at the code, we do in fact set the memalign_threshold = eager_limit by
default, but only if you configured with —enable-btl-openib-malloc-alignment
AND/OR we found the malloc hook functions were available.
You might check config.log to see if the openib malloc hooks were enabled. My
12 matches
Mail list logo