Hi Justin,
If you can build application in debug mode, try inserting valgrind into
your MPI command. It's usually very good in tracking down failing memory
allocations origins.
Kind regards,
- Dmitry.
2017-06-20 1:10 GMT+03:00 Sylvain Jeaugey :
> Justin, can you try setting mpi_leave_pinned to
Justin, can you try setting mpi_leave_pinned to 0 to disable
libptmalloc2 and confirm this is related to ptmalloc ?
Thanks,
Sylvain
On 06/19/2017 03:05 PM, Justin Luitjens wrote:
I have an application that works on other systems but on the current
system I’m running I’m seeing the following
I have an application that works on other systems but on the current system I'm
running I'm seeing the following crash:
[dt04:22457] *** Process received signal ***
[dt04:22457] Signal: Segmentation fault (11)
[dt04:22457] Signal code: Address not mapped (1)
[dt04:22457] Failing at address: 0x555
I don't do any setting of process groups. dum.sh just invokes the executable:
//aborttest10.exe
On 19 Jun 2017 at 10:30, r...@open-mpi.org wrote:
> When you fork that process off, do you set its process group? Or is it in the
> same process group as the shell script?
>
> > On Jun 19, 201
When you fork that process off, do you set its process group? Or is it in the
same process group as the shell script?
> On Jun 19, 2017, at 10:19 AM, Ted Sussman wrote:
>
> If I replace the sleep with an infinite loop, I get the same behavior. One
> "aborttest" process
> remains after all th
If I replace the sleep with an infinite loop, I get the same behavior. One
"aborttest" process
remains after all the signals are sent.
On 19 Jun 2017 at 10:10, r...@open-mpi.org wrote:
>
> That is typical behavior when you throw something into "sleep" - not much we
> can do about it, I
> th
Hello,
I have rebuilt Open MPI 2.1.1 on the same computer, including --enable-debug.
I have attached the abort test program aborttest10.tgz. This version sleeps
for 5 sec before
calling MPI_ABORT, so that I can check the pids using ps.
This is what happens (see run2.sh.out).
Open MPI invokes
Ashwin,
the valgrind logs clearly indicate you are trying to access some memory
that was already free'd
for example
[1,0]:==4683== Invalid read of size 4
[1,0]:==4683==at 0x795DC2: __src_input_MOD_organize_input
(src_input.f90:2318)
[1,0]:==4683== Address 0xb4001d0 is 0 bytes inside a