Re: [OMPI users] Using mtrace with openmpi segfaults

2007-12-07 Thread Jeff Squyres
I'm not sure what using mallopt would do when combined with Open MPI's  
ptmalloc, but I can't imagine that it would be anything good.


If you want users to be able to use mallopt, you should probably  
disable Open MPI's ptmalloc.



On Dec 6, 2007, at 10:16 AM, Jeffrey M Ceason wrote:



Is there a way to disable this at runtime?  Also can an user app use  
mallopt options without interfering with the memory managers?


We have these options set but are getting memory corruption that  
moves around realloc in the program.


 mallopt(M_MMAP_MAX, 0);
 mallopt(M_TRIM_THRESHOLD, -1);




Jeff Squyres <jsquy...@cisco.com>
Sent by: users-boun...@open-mpi.org
12/06/2007 07:44 AM
Please respond to
Open MPI Users <us...@open-mpi.org>

To
Open MPI Users <us...@open-mpi.org>
cc
Subject
Re: [OMPI users] Using mtrace with openmpi segfaults





I have not tried to use mtrace myself.  But I can see how it would be
problematic with OMPI's internal use of ptmalloc2.  If you are not
using InfiniBand or Myrinet over GM, you don't need OMPI to have an
internal copy of ptmalloc2.  You can disable OMPI's ptmalloc2 by
configuring with:

  ./configure --without-memory-manager


On Dec 3, 2007, at 6:23 PM, Jeffrey M Ceason wrote:

>
> Having trouble using mtrace with openmpi.  Whenever I use the mtrace
> call before or after MPI_Init the application terminates.  This only
> seems to happen using mpi.  Is there a way to disable the open-mpi
> memory wrappers?  Is there known issues with users applications
> using mallopts and the mallopts used by open-mpi?
>
> Machine is AMD64 Fedora Core 7
> [ceason@n01-044-0 minib]$ uname -a
> Linux n01-044-0 2.6.22-rr #1 SMP Fri Nov 16 15:28:53 CST 2007 x86_64
> x86_64 x86_64 GNU/Linux
> [ceason@n01-044-0 minib]$
>
>
> Test source.
> #include 
> #include 
> #include 
> #include 
> #include 
> #include 
> #include 
>
> using namespace std;
>
> int main (int argc,char * argv[]) {
> mtrace();
> MPI_Init(NULL,NULL);
>
>MPI_Finalize();
> }
>
>
> [ceason@n01-044-0 minib]$ mpiCC dacs_test.cc -o trace_test
> [ceason@n01-044-0 minib]$ mpirun -np 1 trace_test
> mpirun noticed that job rank 0 with PID 7078 on node n01-044-0
> exited on signal 8 (Floating point exception).
> [ceason@n01-044-0 minib]$
>
>
> backtrace of core
>
> Core was generated by `trace_test'.
> Program terminated with signal 8, Arithmetic exception.
> #0  0x2b33169d4abc in sYSTRIm () from /lib64/libc.so.6
> (gdb) bt
> #0  0x2b33169d4abc in sYSTRIm () from /lib64/libc.so.6
> #1  0x2b33169d71c2 in _int_free () from /lib64/libc.so.6
> #2  0x2b33169dab1c in free () from /lib64/libc.so.6
> #3  0x2b33169dcee8 in tr_freehook () from /lib64/libc.so.6
> #4  0x2b33157674f3 in free () from /usr/local/openmpi/lib64/
> libopen-pal.so.0
> #5  0x2b33169ceaf1 in vasprintf () from /lib64/libc.so.6
> #6  0x2b33169b3088 in asprintf () from /lib64/libc.so.6
> #7  0x2b3315760c7d in opal_output_init () from /usr/local/
> openmpi/lib64/libopen-pal.so.0
> #8  0x2b3315760a2a in do_open () from /usr/local/openmpi/lib64/
> libopen-pal.so.0
> #9  0x2b331575f958 in opal_malloc_init () from /usr/local/
> openmpi/lib64/libopen-pal.so.0
> #10 0x2b331574ac27 in opal_init_util () from /usr/local/openmpi/
> lib64/libopen-pal.so.0
> #11 0x2b331574ad06 in opal_init () from /usr/local/openmpi/ 
lib64/

> libopen-pal.so.0
> #12 0x2b3315283edf in ompi_mpi_init () from /usr/local/openmpi/
> lib64/libmpi.so.0
> #13 0x2b33152a54f0 in PMPI_Init () from /usr/local/openmpi/ 
lib64/

> libmpi.so.0
> #14 0x00408397 in main ()
> (gdb)
>
> Shouldn't involve communications between machines but here is the IB
> Info.
>
> [ceason@n01-044-0 minib]$ ibv_devinfo
> hca_id: mlx4_0
> fw_ver: 2.2.000
> node_guid:  0002:c903::17d0
> sys_image_guid: 0002:c903::17d3
> vendor_id:  0x02c9
> vendor_part_id: 25418
> hw_ver: 0xA0
> board_id:   MT_04A0110002
> phys_port_cnt:  2
> port:   1
> state:  PORT_ACTIVE (4)
> max_mtu:2048 (4)
> active_mtu: 2048 (4)
> sm_lid: 1
> port_lid:   8
> port_lmc:   0x00
>
> port:   2
> state:  PORT_DOWN (1)
> max_mtu:2048 

Re: [OMPI users] Using mtrace with openmpi segfaults

2007-12-06 Thread Jeff Squyres

On Dec 6, 2007, at 10:14 AM, Sajjad Tabib wrote:

Is it possible to disable ptmalloc2 at runtime by disabling the  
component?


Nope -- this one has to be compiled and linked in ahead of time.   
Sorry.  :-\


--
Jeff Squyres
Cisco Systems


Re: [OMPI users] Using mtrace with openmpi segfaults

2007-12-06 Thread Jeffrey M Ceason
Is there a way to disable this at runtime?  Also can an user app use 
mallopt options without interfering with the memory managers?

We have these options set but are getting memory corruption that moves 
around realloc in the program.

 mallopt(M_MMAP_MAX, 0);
 mallopt(M_TRIM_THRESHOLD, -1);





Jeff Squyres <jsquy...@cisco.com> 
Sent by: users-boun...@open-mpi.org
12/06/2007 07:44 AM
Please respond to
Open MPI Users <us...@open-mpi.org>


To
Open MPI Users <us...@open-mpi.org>
cc

Subject
Re: [OMPI users] Using mtrace with openmpi segfaults






I have not tried to use mtrace myself.  But I can see how it would be 
problematic with OMPI's internal use of ptmalloc2.  If you are not 
using InfiniBand or Myrinet over GM, you don't need OMPI to have an 
internal copy of ptmalloc2.  You can disable OMPI's ptmalloc2 by 
configuring with:

   ./configure --without-memory-manager


On Dec 3, 2007, at 6:23 PM, Jeffrey M Ceason wrote:

>
> Having trouble using mtrace with openmpi.  Whenever I use the mtrace 
> call before or after MPI_Init the application terminates.  This only 
> seems to happen using mpi.  Is there a way to disable the open-mpi 
> memory wrappers?  Is there known issues with users applications 
> using mallopts and the mallopts used by open-mpi?
>
> Machine is AMD64 Fedora Core 7
> [ceason@n01-044-0 minib]$ uname -a
> Linux n01-044-0 2.6.22-rr #1 SMP Fri Nov 16 15:28:53 CST 2007 x86_64 
> x86_64 x86_64 GNU/Linux
> [ceason@n01-044-0 minib]$
>
>
> Test source.
> #include 
> #include 
> #include 
> #include 
> #include 
> #include 
> #include 
>
> using namespace std;
>
> int main (int argc,char * argv[]) {
> mtrace();
> MPI_Init(NULL,NULL);
>
>MPI_Finalize();
> }
>
>
> [ceason@n01-044-0 minib]$ mpiCC dacs_test.cc -o trace_test
> [ceason@n01-044-0 minib]$ mpirun -np 1 trace_test
> mpirun noticed that job rank 0 with PID 7078 on node n01-044-0 
> exited on signal 8 (Floating point exception).
> [ceason@n01-044-0 minib]$
>
>
> backtrace of core
>
> Core was generated by `trace_test'.
> Program terminated with signal 8, Arithmetic exception.
> #0  0x2b33169d4abc in sYSTRIm () from /lib64/libc.so.6
> (gdb) bt
> #0  0x2b33169d4abc in sYSTRIm () from /lib64/libc.so.6
> #1  0x2b33169d71c2 in _int_free () from /lib64/libc.so.6
> #2  0x2b33169dab1c in free () from /lib64/libc.so.6
> #3  0x2b33169dcee8 in tr_freehook () from /lib64/libc.so.6
> #4  0x2b33157674f3 in free () from /usr/local/openmpi/lib64/ 
> libopen-pal.so.0
> #5  0x2b33169ceaf1 in vasprintf () from /lib64/libc.so.6
> #6  0x2b33169b3088 in asprintf () from /lib64/libc.so.6
> #7  0x2b3315760c7d in opal_output_init () from /usr/local/ 
> openmpi/lib64/libopen-pal.so.0
> #8  0x2b3315760a2a in do_open () from /usr/local/openmpi/lib64/ 
> libopen-pal.so.0
> #9  0x2b331575f958 in opal_malloc_init () from /usr/local/ 
> openmpi/lib64/libopen-pal.so.0
> #10 0x2b331574ac27 in opal_init_util () from /usr/local/openmpi/ 
> lib64/libopen-pal.so.0
> #11 0x2b331574ad06 in opal_init () from /usr/local/openmpi/lib64/ 
> libopen-pal.so.0
> #12 0x2b3315283edf in ompi_mpi_init () from /usr/local/openmpi/ 
> lib64/libmpi.so.0
> #13 0x2b33152a54f0 in PMPI_Init () from /usr/local/openmpi/lib64/ 
> libmpi.so.0
> #14 0x00408397 in main ()
> (gdb)
>
> Shouldn't involve communications between machines but here is the IB 
> Info.
>
> [ceason@n01-044-0 minib]$ ibv_devinfo
> hca_id: mlx4_0
> fw_ver: 2.2.000
> node_guid:  0002:c903::17d0
> sys_image_guid: 0002:c903::17d3
> vendor_id:  0x02c9
> vendor_part_id: 25418
> hw_ver: 0xA0
> board_id:   MT_04A0110002
> phys_port_cnt:  2
> port:   1
> state:  PORT_ACTIVE (4)
> max_mtu:2048 (4)
> active_mtu: 2048 (4)
> sm_lid: 1
> port_lid:   8
> port_lmc:   0x00
>
> port:   2
> state:  PORT_DOWN (1)
> max_mtu:2048 (4)
> active_mtu: 2048 (4)
> sm_lid: 0
> port_lid:   0
> port_lmc:   0x00
>
> [ceason@n01-044-0 minib]$
>
> [ceason@

Re: [OMPI users] Using mtrace with openmpi segfaults

2007-12-06 Thread Sajjad Tabib
Hi,

Is it possible to disable ptmalloc2 at runtime by disabling the component? 


Thanks,

Sajjad Tabib




Jeff Squyres <jsquy...@cisco.com> 
Sent by: users-boun...@open-mpi.org
12/06/07 07:44 AM
Please respond to
Open MPI Users <us...@open-mpi.org>


To
Open MPI Users <us...@open-mpi.org>
cc

Subject
Re: [OMPI users] Using mtrace with openmpi segfaults






I have not tried to use mtrace myself.  But I can see how it would be 
problematic with OMPI's internal use of ptmalloc2.  If you are not 
using InfiniBand or Myrinet over GM, you don't need OMPI to have an 
internal copy of ptmalloc2.  You can disable OMPI's ptmalloc2 by 
configuring with:

   ./configure --without-memory-manager


On Dec 3, 2007, at 6:23 PM, Jeffrey M Ceason wrote:

>
> Having trouble using mtrace with openmpi.  Whenever I use the mtrace 
> call before or after MPI_Init the application terminates.  This only 
> seems to happen using mpi.  Is there a way to disable the open-mpi 
> memory wrappers?  Is there known issues with users applications 
> using mallopts and the mallopts used by open-mpi?
>
> Machine is AMD64 Fedora Core 7
> [ceason@n01-044-0 minib]$ uname -a
> Linux n01-044-0 2.6.22-rr #1 SMP Fri Nov 16 15:28:53 CST 2007 x86_64 
> x86_64 x86_64 GNU/Linux
> [ceason@n01-044-0 minib]$
>
>
> Test source.
> #include 
> #include 
> #include 
> #include 
> #include 
> #include 
> #include 
>
> using namespace std;
>
> int main (int argc,char * argv[]) {
> mtrace();
> MPI_Init(NULL,NULL);
>
>MPI_Finalize();
> }
>
>
> [ceason@n01-044-0 minib]$ mpiCC dacs_test.cc -o trace_test
> [ceason@n01-044-0 minib]$ mpirun -np 1 trace_test
> mpirun noticed that job rank 0 with PID 7078 on node n01-044-0 
> exited on signal 8 (Floating point exception).
> [ceason@n01-044-0 minib]$
>
>
> backtrace of core
>
> Core was generated by `trace_test'.
> Program terminated with signal 8, Arithmetic exception.
> #0  0x2b33169d4abc in sYSTRIm () from /lib64/libc.so.6
> (gdb) bt
> #0  0x2b33169d4abc in sYSTRIm () from /lib64/libc.so.6
> #1  0x2b33169d71c2 in _int_free () from /lib64/libc.so.6
> #2  0x2b33169dab1c in free () from /lib64/libc.so.6
> #3  0x2b33169dcee8 in tr_freehook () from /lib64/libc.so.6
> #4  0x2b33157674f3 in free () from /usr/local/openmpi/lib64/ 
> libopen-pal.so.0
> #5  0x2b33169ceaf1 in vasprintf () from /lib64/libc.so.6
> #6  0x2b33169b3088 in asprintf () from /lib64/libc.so.6
> #7  0x2b3315760c7d in opal_output_init () from /usr/local/ 
> openmpi/lib64/libopen-pal.so.0
> #8  0x2b3315760a2a in do_open () from /usr/local/openmpi/lib64/ 
> libopen-pal.so.0
> #9  0x2b331575f958 in opal_malloc_init () from /usr/local/ 
> openmpi/lib64/libopen-pal.so.0
> #10 0x2b331574ac27 in opal_init_util () from /usr/local/openmpi/ 
> lib64/libopen-pal.so.0
> #11 0x2b331574ad06 in opal_init () from /usr/local/openmpi/lib64/ 
> libopen-pal.so.0
> #12 0x2b3315283edf in ompi_mpi_init () from /usr/local/openmpi/ 
> lib64/libmpi.so.0
> #13 0x2b33152a54f0 in PMPI_Init () from /usr/local/openmpi/lib64/ 
> libmpi.so.0
> #14 0x00408397 in main ()
> (gdb)
>
> Shouldn't involve communications between machines but here is the IB 
> Info.
>
> [ceason@n01-044-0 minib]$ ibv_devinfo
> hca_id: mlx4_0
> fw_ver: 2.2.000
> node_guid:  0002:c903::17d0
> sys_image_guid: 0002:c903::17d3
> vendor_id:  0x02c9
> vendor_part_id: 25418
> hw_ver: 0xA0
> board_id:   MT_04A0110002
> phys_port_cnt:  2
> port:   1
> state:  PORT_ACTIVE (4)
> max_mtu:2048 (4)
> active_mtu: 2048 (4)
> sm_lid: 1
> port_lid:   8
> port_lmc:   0x00
>
> port:   2
> state:  PORT_DOWN (1)
> max_mtu:2048 (4)
> active_mtu: 2048 (4)
> sm_lid: 0
> port_lid:   0
> port_lmc:   0x00
>
> [ceason@n01-044-0 minib]$
>
> [ceason@n01-044-0 minib]$ ulimit -l
> unlimited
>
>
>
>
> ompi_info -all output
>
> Open MPI: 1.2.4
>Open MPI SVN revision: r16187
>  

[OMPI users] Using mtrace with openmpi segfaults

2007-12-03 Thread Jeffrey M Ceason
Having trouble using mtrace with openmpi.  Whenever I use the mtrace call 
before or after MPI_Init the application terminates.  This only seems to 
happen using mpi.  Is there a way to disable the open-mpi memory wrappers? 
 Is there known issues with users applications using mallopts and the 
mallopts used by open-mpi?

Machine is AMD64 Fedora Core 7
[ceason@n01-044-0 minib]$ uname -a
Linux n01-044-0 2.6.22-rr #1 SMP Fri Nov 16 15:28:53 CST 2007 x86_64 
x86_64 x86_64 GNU/Linux
[ceason@n01-044-0 minib]$ 


Test source.
#include 
#include 
#include 
#include 
#include 
#include 
#include 

using namespace std;

int main (int argc,char * argv[]) {
mtrace();
MPI_Init(NULL,NULL);

   MPI_Finalize();
}


[ceason@n01-044-0 minib]$ mpiCC dacs_test.cc -o trace_test
[ceason@n01-044-0 minib]$ mpirun -np 1 trace_test 
mpirun noticed that job rank 0 with PID 7078 on node n01-044-0 exited on 
signal 8 (Floating point exception). 
[ceason@n01-044-0 minib]$ 


backtrace of core

Core was generated by `trace_test'.
Program terminated with signal 8, Arithmetic exception.
#0  0x2b33169d4abc in sYSTRIm () from /lib64/libc.so.6
(gdb) bt
#0  0x2b33169d4abc in sYSTRIm () from /lib64/libc.so.6
#1  0x2b33169d71c2 in _int_free () from /lib64/libc.so.6
#2  0x2b33169dab1c in free () from /lib64/libc.so.6
#3  0x2b33169dcee8 in tr_freehook () from /lib64/libc.so.6
#4  0x2b33157674f3 in free () from 
/usr/local/openmpi/lib64/libopen-pal.so.0
#5  0x2b33169ceaf1 in vasprintf () from /lib64/libc.so.6
#6  0x2b33169b3088 in asprintf () from /lib64/libc.so.6
#7  0x2b3315760c7d in opal_output_init () from 
/usr/local/openmpi/lib64/libopen-pal.so.0
#8  0x2b3315760a2a in do_open () from 
/usr/local/openmpi/lib64/libopen-pal.so.0
#9  0x2b331575f958 in opal_malloc_init () from 
/usr/local/openmpi/lib64/libopen-pal.so.0
#10 0x2b331574ac27 in opal_init_util () from 
/usr/local/openmpi/lib64/libopen-pal.so.0
#11 0x2b331574ad06 in opal_init () from 
/usr/local/openmpi/lib64/libopen-pal.so.0
#12 0x2b3315283edf in ompi_mpi_init () from 
/usr/local/openmpi/lib64/libmpi.so.0
#13 0x2b33152a54f0 in PMPI_Init () from 
/usr/local/openmpi/lib64/libmpi.so.0
#14 0x00408397 in main ()
(gdb) 

Shouldn't involve communications between machines but here is the IB Info.

[ceason@n01-044-0 minib]$ ibv_devinfo 
hca_id: mlx4_0
fw_ver: 2.2.000
node_guid:  0002:c903::17d0
sys_image_guid: 0002:c903::17d3
vendor_id:  0x02c9
vendor_part_id: 25418
hw_ver: 0xA0
board_id:   MT_04A0110002
phys_port_cnt:  2
port:   1
state:  PORT_ACTIVE (4)
max_mtu:2048 (4)
active_mtu: 2048 (4)
sm_lid: 1
port_lid:   8
port_lmc:   0x00

port:   2
state:  PORT_DOWN (1)
max_mtu:2048 (4)
active_mtu: 2048 (4)
sm_lid: 0
port_lid:   0
port_lmc:   0x00

[ceason@n01-044-0 minib]$ 

[ceason@n01-044-0 minib]$ ulimit -l
unlimited




ompi_info -all output

Open MPI: 1.2.4
   Open MPI SVN revision: r16187
Open RTE: 1.2.4
   Open RTE SVN revision: r16187
OPAL: 1.2.4
   OPAL SVN revision: r16187
   MCA backtrace: execinfo (MCA v1.0, API v1.0, Component v1.2.4)
  MCA memory: ptmalloc2 (MCA v1.0, API v1.0, Component v1.2.4)
   MCA paffinity: linux (MCA v1.0, API v1.0, Component v1.2.4)
   MCA maffinity: first_use (MCA v1.0, API v1.0, Component v1.2.4)
   MCA maffinity: libnuma (MCA v1.0, API v1.0, Component v1.2.4)
   MCA timer: linux (MCA v1.0, API v1.0, Component v1.2.4)
 MCA installdirs: env (MCA v1.0, API v1.0, Component v1.2.4)
 MCA installdirs: config (MCA v1.0, API v1.0, Component v1.2.4)
   MCA allocator: basic (MCA v1.0, API v1.0, Component v1.0)
   MCA allocator: bucket (MCA v1.0, API v1.0, Component v1.0)
MCA coll: basic (MCA v1.0, API v1.0, Component v1.2.4)
MCA coll: self (MCA v1.0, API v1.0, Component v1.2.4)
MCA coll: sm (MCA v1.0, API v1.0, Component v1.2.4)
MCA coll: tuned (MCA v1.0, API v1.0, Component v1.2.4)
  MCA io: romio (MCA v1.0, API v1.0, Component v1.2.4)
   MCA mpool: rdma (MCA v1.0, API v1.0, Component v1.2.4)
   MCA mpool: sm (MCA v1.0, API v1.0, Component