Re: [OMPI users] Using mtrace with openmpi segfaults
I'm not sure what using mallopt would do when combined with Open MPI's ptmalloc, but I can't imagine that it would be anything good. If you want users to be able to use mallopt, you should probably disable Open MPI's ptmalloc. On Dec 6, 2007, at 10:16 AM, Jeffrey M Ceason wrote: Is there a way to disable this at runtime? Also can an user app use mallopt options without interfering with the memory managers? We have these options set but are getting memory corruption that moves around realloc in the program. mallopt(M_MMAP_MAX, 0); mallopt(M_TRIM_THRESHOLD, -1); Jeff Squyres <jsquy...@cisco.com> Sent by: users-boun...@open-mpi.org 12/06/2007 07:44 AM Please respond to Open MPI Users <us...@open-mpi.org> To Open MPI Users <us...@open-mpi.org> cc Subject Re: [OMPI users] Using mtrace with openmpi segfaults I have not tried to use mtrace myself. But I can see how it would be problematic with OMPI's internal use of ptmalloc2. If you are not using InfiniBand or Myrinet over GM, you don't need OMPI to have an internal copy of ptmalloc2. You can disable OMPI's ptmalloc2 by configuring with: ./configure --without-memory-manager On Dec 3, 2007, at 6:23 PM, Jeffrey M Ceason wrote: > > Having trouble using mtrace with openmpi. Whenever I use the mtrace > call before or after MPI_Init the application terminates. This only > seems to happen using mpi. Is there a way to disable the open-mpi > memory wrappers? Is there known issues with users applications > using mallopts and the mallopts used by open-mpi? > > Machine is AMD64 Fedora Core 7 > [ceason@n01-044-0 minib]$ uname -a > Linux n01-044-0 2.6.22-rr #1 SMP Fri Nov 16 15:28:53 CST 2007 x86_64 > x86_64 x86_64 GNU/Linux > [ceason@n01-044-0 minib]$ > > > Test source. > #include > #include > #include > #include > #include > #include > #include > > using namespace std; > > int main (int argc,char * argv[]) { > mtrace(); > MPI_Init(NULL,NULL); > >MPI_Finalize(); > } > > > [ceason@n01-044-0 minib]$ mpiCC dacs_test.cc -o trace_test > [ceason@n01-044-0 minib]$ mpirun -np 1 trace_test > mpirun noticed that job rank 0 with PID 7078 on node n01-044-0 > exited on signal 8 (Floating point exception). > [ceason@n01-044-0 minib]$ > > > backtrace of core > > Core was generated by `trace_test'. > Program terminated with signal 8, Arithmetic exception. > #0 0x2b33169d4abc in sYSTRIm () from /lib64/libc.so.6 > (gdb) bt > #0 0x2b33169d4abc in sYSTRIm () from /lib64/libc.so.6 > #1 0x2b33169d71c2 in _int_free () from /lib64/libc.so.6 > #2 0x2b33169dab1c in free () from /lib64/libc.so.6 > #3 0x2b33169dcee8 in tr_freehook () from /lib64/libc.so.6 > #4 0x2b33157674f3 in free () from /usr/local/openmpi/lib64/ > libopen-pal.so.0 > #5 0x2b33169ceaf1 in vasprintf () from /lib64/libc.so.6 > #6 0x2b33169b3088 in asprintf () from /lib64/libc.so.6 > #7 0x2b3315760c7d in opal_output_init () from /usr/local/ > openmpi/lib64/libopen-pal.so.0 > #8 0x2b3315760a2a in do_open () from /usr/local/openmpi/lib64/ > libopen-pal.so.0 > #9 0x2b331575f958 in opal_malloc_init () from /usr/local/ > openmpi/lib64/libopen-pal.so.0 > #10 0x2b331574ac27 in opal_init_util () from /usr/local/openmpi/ > lib64/libopen-pal.so.0 > #11 0x2b331574ad06 in opal_init () from /usr/local/openmpi/ lib64/ > libopen-pal.so.0 > #12 0x2b3315283edf in ompi_mpi_init () from /usr/local/openmpi/ > lib64/libmpi.so.0 > #13 0x2b33152a54f0 in PMPI_Init () from /usr/local/openmpi/ lib64/ > libmpi.so.0 > #14 0x00408397 in main () > (gdb) > > Shouldn't involve communications between machines but here is the IB > Info. > > [ceason@n01-044-0 minib]$ ibv_devinfo > hca_id: mlx4_0 > fw_ver: 2.2.000 > node_guid: 0002:c903::17d0 > sys_image_guid: 0002:c903::17d3 > vendor_id: 0x02c9 > vendor_part_id: 25418 > hw_ver: 0xA0 > board_id: MT_04A0110002 > phys_port_cnt: 2 > port: 1 > state: PORT_ACTIVE (4) > max_mtu:2048 (4) > active_mtu: 2048 (4) > sm_lid: 1 > port_lid: 8 > port_lmc: 0x00 > > port: 2 > state: PORT_DOWN (1) > max_mtu:2048
Re: [OMPI users] Using mtrace with openmpi segfaults
On Dec 6, 2007, at 10:14 AM, Sajjad Tabib wrote: Is it possible to disable ptmalloc2 at runtime by disabling the component? Nope -- this one has to be compiled and linked in ahead of time. Sorry. :-\ -- Jeff Squyres Cisco Systems
Re: [OMPI users] Using mtrace with openmpi segfaults
Is there a way to disable this at runtime? Also can an user app use mallopt options without interfering with the memory managers? We have these options set but are getting memory corruption that moves around realloc in the program. mallopt(M_MMAP_MAX, 0); mallopt(M_TRIM_THRESHOLD, -1); Jeff Squyres <jsquy...@cisco.com> Sent by: users-boun...@open-mpi.org 12/06/2007 07:44 AM Please respond to Open MPI Users <us...@open-mpi.org> To Open MPI Users <us...@open-mpi.org> cc Subject Re: [OMPI users] Using mtrace with openmpi segfaults I have not tried to use mtrace myself. But I can see how it would be problematic with OMPI's internal use of ptmalloc2. If you are not using InfiniBand or Myrinet over GM, you don't need OMPI to have an internal copy of ptmalloc2. You can disable OMPI's ptmalloc2 by configuring with: ./configure --without-memory-manager On Dec 3, 2007, at 6:23 PM, Jeffrey M Ceason wrote: > > Having trouble using mtrace with openmpi. Whenever I use the mtrace > call before or after MPI_Init the application terminates. This only > seems to happen using mpi. Is there a way to disable the open-mpi > memory wrappers? Is there known issues with users applications > using mallopts and the mallopts used by open-mpi? > > Machine is AMD64 Fedora Core 7 > [ceason@n01-044-0 minib]$ uname -a > Linux n01-044-0 2.6.22-rr #1 SMP Fri Nov 16 15:28:53 CST 2007 x86_64 > x86_64 x86_64 GNU/Linux > [ceason@n01-044-0 minib]$ > > > Test source. > #include > #include > #include > #include > #include > #include > #include > > using namespace std; > > int main (int argc,char * argv[]) { > mtrace(); > MPI_Init(NULL,NULL); > >MPI_Finalize(); > } > > > [ceason@n01-044-0 minib]$ mpiCC dacs_test.cc -o trace_test > [ceason@n01-044-0 minib]$ mpirun -np 1 trace_test > mpirun noticed that job rank 0 with PID 7078 on node n01-044-0 > exited on signal 8 (Floating point exception). > [ceason@n01-044-0 minib]$ > > > backtrace of core > > Core was generated by `trace_test'. > Program terminated with signal 8, Arithmetic exception. > #0 0x2b33169d4abc in sYSTRIm () from /lib64/libc.so.6 > (gdb) bt > #0 0x2b33169d4abc in sYSTRIm () from /lib64/libc.so.6 > #1 0x2b33169d71c2 in _int_free () from /lib64/libc.so.6 > #2 0x2b33169dab1c in free () from /lib64/libc.so.6 > #3 0x2b33169dcee8 in tr_freehook () from /lib64/libc.so.6 > #4 0x2b33157674f3 in free () from /usr/local/openmpi/lib64/ > libopen-pal.so.0 > #5 0x2b33169ceaf1 in vasprintf () from /lib64/libc.so.6 > #6 0x2b33169b3088 in asprintf () from /lib64/libc.so.6 > #7 0x2b3315760c7d in opal_output_init () from /usr/local/ > openmpi/lib64/libopen-pal.so.0 > #8 0x2b3315760a2a in do_open () from /usr/local/openmpi/lib64/ > libopen-pal.so.0 > #9 0x2b331575f958 in opal_malloc_init () from /usr/local/ > openmpi/lib64/libopen-pal.so.0 > #10 0x2b331574ac27 in opal_init_util () from /usr/local/openmpi/ > lib64/libopen-pal.so.0 > #11 0x2b331574ad06 in opal_init () from /usr/local/openmpi/lib64/ > libopen-pal.so.0 > #12 0x2b3315283edf in ompi_mpi_init () from /usr/local/openmpi/ > lib64/libmpi.so.0 > #13 0x2b33152a54f0 in PMPI_Init () from /usr/local/openmpi/lib64/ > libmpi.so.0 > #14 0x00408397 in main () > (gdb) > > Shouldn't involve communications between machines but here is the IB > Info. > > [ceason@n01-044-0 minib]$ ibv_devinfo > hca_id: mlx4_0 > fw_ver: 2.2.000 > node_guid: 0002:c903::17d0 > sys_image_guid: 0002:c903::17d3 > vendor_id: 0x02c9 > vendor_part_id: 25418 > hw_ver: 0xA0 > board_id: MT_04A0110002 > phys_port_cnt: 2 > port: 1 > state: PORT_ACTIVE (4) > max_mtu:2048 (4) > active_mtu: 2048 (4) > sm_lid: 1 > port_lid: 8 > port_lmc: 0x00 > > port: 2 > state: PORT_DOWN (1) > max_mtu:2048 (4) > active_mtu: 2048 (4) > sm_lid: 0 > port_lid: 0 > port_lmc: 0x00 > > [ceason@n01-044-0 minib]$ > > [ceason@
Re: [OMPI users] Using mtrace with openmpi segfaults
Hi, Is it possible to disable ptmalloc2 at runtime by disabling the component? Thanks, Sajjad Tabib Jeff Squyres <jsquy...@cisco.com> Sent by: users-boun...@open-mpi.org 12/06/07 07:44 AM Please respond to Open MPI Users <us...@open-mpi.org> To Open MPI Users <us...@open-mpi.org> cc Subject Re: [OMPI users] Using mtrace with openmpi segfaults I have not tried to use mtrace myself. But I can see how it would be problematic with OMPI's internal use of ptmalloc2. If you are not using InfiniBand or Myrinet over GM, you don't need OMPI to have an internal copy of ptmalloc2. You can disable OMPI's ptmalloc2 by configuring with: ./configure --without-memory-manager On Dec 3, 2007, at 6:23 PM, Jeffrey M Ceason wrote: > > Having trouble using mtrace with openmpi. Whenever I use the mtrace > call before or after MPI_Init the application terminates. This only > seems to happen using mpi. Is there a way to disable the open-mpi > memory wrappers? Is there known issues with users applications > using mallopts and the mallopts used by open-mpi? > > Machine is AMD64 Fedora Core 7 > [ceason@n01-044-0 minib]$ uname -a > Linux n01-044-0 2.6.22-rr #1 SMP Fri Nov 16 15:28:53 CST 2007 x86_64 > x86_64 x86_64 GNU/Linux > [ceason@n01-044-0 minib]$ > > > Test source. > #include > #include > #include > #include > #include > #include > #include > > using namespace std; > > int main (int argc,char * argv[]) { > mtrace(); > MPI_Init(NULL,NULL); > >MPI_Finalize(); > } > > > [ceason@n01-044-0 minib]$ mpiCC dacs_test.cc -o trace_test > [ceason@n01-044-0 minib]$ mpirun -np 1 trace_test > mpirun noticed that job rank 0 with PID 7078 on node n01-044-0 > exited on signal 8 (Floating point exception). > [ceason@n01-044-0 minib]$ > > > backtrace of core > > Core was generated by `trace_test'. > Program terminated with signal 8, Arithmetic exception. > #0 0x2b33169d4abc in sYSTRIm () from /lib64/libc.so.6 > (gdb) bt > #0 0x2b33169d4abc in sYSTRIm () from /lib64/libc.so.6 > #1 0x2b33169d71c2 in _int_free () from /lib64/libc.so.6 > #2 0x2b33169dab1c in free () from /lib64/libc.so.6 > #3 0x2b33169dcee8 in tr_freehook () from /lib64/libc.so.6 > #4 0x2b33157674f3 in free () from /usr/local/openmpi/lib64/ > libopen-pal.so.0 > #5 0x2b33169ceaf1 in vasprintf () from /lib64/libc.so.6 > #6 0x2b33169b3088 in asprintf () from /lib64/libc.so.6 > #7 0x2b3315760c7d in opal_output_init () from /usr/local/ > openmpi/lib64/libopen-pal.so.0 > #8 0x2b3315760a2a in do_open () from /usr/local/openmpi/lib64/ > libopen-pal.so.0 > #9 0x2b331575f958 in opal_malloc_init () from /usr/local/ > openmpi/lib64/libopen-pal.so.0 > #10 0x2b331574ac27 in opal_init_util () from /usr/local/openmpi/ > lib64/libopen-pal.so.0 > #11 0x2b331574ad06 in opal_init () from /usr/local/openmpi/lib64/ > libopen-pal.so.0 > #12 0x2b3315283edf in ompi_mpi_init () from /usr/local/openmpi/ > lib64/libmpi.so.0 > #13 0x2b33152a54f0 in PMPI_Init () from /usr/local/openmpi/lib64/ > libmpi.so.0 > #14 0x00408397 in main () > (gdb) > > Shouldn't involve communications between machines but here is the IB > Info. > > [ceason@n01-044-0 minib]$ ibv_devinfo > hca_id: mlx4_0 > fw_ver: 2.2.000 > node_guid: 0002:c903::17d0 > sys_image_guid: 0002:c903::17d3 > vendor_id: 0x02c9 > vendor_part_id: 25418 > hw_ver: 0xA0 > board_id: MT_04A0110002 > phys_port_cnt: 2 > port: 1 > state: PORT_ACTIVE (4) > max_mtu:2048 (4) > active_mtu: 2048 (4) > sm_lid: 1 > port_lid: 8 > port_lmc: 0x00 > > port: 2 > state: PORT_DOWN (1) > max_mtu:2048 (4) > active_mtu: 2048 (4) > sm_lid: 0 > port_lid: 0 > port_lmc: 0x00 > > [ceason@n01-044-0 minib]$ > > [ceason@n01-044-0 minib]$ ulimit -l > unlimited > > > > > ompi_info -all output > > Open MPI: 1.2.4 >Open MPI SVN revision: r16187 >
[OMPI users] Using mtrace with openmpi segfaults
Having trouble using mtrace with openmpi. Whenever I use the mtrace call before or after MPI_Init the application terminates. This only seems to happen using mpi. Is there a way to disable the open-mpi memory wrappers? Is there known issues with users applications using mallopts and the mallopts used by open-mpi? Machine is AMD64 Fedora Core 7 [ceason@n01-044-0 minib]$ uname -a Linux n01-044-0 2.6.22-rr #1 SMP Fri Nov 16 15:28:53 CST 2007 x86_64 x86_64 x86_64 GNU/Linux [ceason@n01-044-0 minib]$ Test source. #include #include #include #include #include #include #include using namespace std; int main (int argc,char * argv[]) { mtrace(); MPI_Init(NULL,NULL); MPI_Finalize(); } [ceason@n01-044-0 minib]$ mpiCC dacs_test.cc -o trace_test [ceason@n01-044-0 minib]$ mpirun -np 1 trace_test mpirun noticed that job rank 0 with PID 7078 on node n01-044-0 exited on signal 8 (Floating point exception). [ceason@n01-044-0 minib]$ backtrace of core Core was generated by `trace_test'. Program terminated with signal 8, Arithmetic exception. #0 0x2b33169d4abc in sYSTRIm () from /lib64/libc.so.6 (gdb) bt #0 0x2b33169d4abc in sYSTRIm () from /lib64/libc.so.6 #1 0x2b33169d71c2 in _int_free () from /lib64/libc.so.6 #2 0x2b33169dab1c in free () from /lib64/libc.so.6 #3 0x2b33169dcee8 in tr_freehook () from /lib64/libc.so.6 #4 0x2b33157674f3 in free () from /usr/local/openmpi/lib64/libopen-pal.so.0 #5 0x2b33169ceaf1 in vasprintf () from /lib64/libc.so.6 #6 0x2b33169b3088 in asprintf () from /lib64/libc.so.6 #7 0x2b3315760c7d in opal_output_init () from /usr/local/openmpi/lib64/libopen-pal.so.0 #8 0x2b3315760a2a in do_open () from /usr/local/openmpi/lib64/libopen-pal.so.0 #9 0x2b331575f958 in opal_malloc_init () from /usr/local/openmpi/lib64/libopen-pal.so.0 #10 0x2b331574ac27 in opal_init_util () from /usr/local/openmpi/lib64/libopen-pal.so.0 #11 0x2b331574ad06 in opal_init () from /usr/local/openmpi/lib64/libopen-pal.so.0 #12 0x2b3315283edf in ompi_mpi_init () from /usr/local/openmpi/lib64/libmpi.so.0 #13 0x2b33152a54f0 in PMPI_Init () from /usr/local/openmpi/lib64/libmpi.so.0 #14 0x00408397 in main () (gdb) Shouldn't involve communications between machines but here is the IB Info. [ceason@n01-044-0 minib]$ ibv_devinfo hca_id: mlx4_0 fw_ver: 2.2.000 node_guid: 0002:c903::17d0 sys_image_guid: 0002:c903::17d3 vendor_id: 0x02c9 vendor_part_id: 25418 hw_ver: 0xA0 board_id: MT_04A0110002 phys_port_cnt: 2 port: 1 state: PORT_ACTIVE (4) max_mtu:2048 (4) active_mtu: 2048 (4) sm_lid: 1 port_lid: 8 port_lmc: 0x00 port: 2 state: PORT_DOWN (1) max_mtu:2048 (4) active_mtu: 2048 (4) sm_lid: 0 port_lid: 0 port_lmc: 0x00 [ceason@n01-044-0 minib]$ [ceason@n01-044-0 minib]$ ulimit -l unlimited ompi_info -all output Open MPI: 1.2.4 Open MPI SVN revision: r16187 Open RTE: 1.2.4 Open RTE SVN revision: r16187 OPAL: 1.2.4 OPAL SVN revision: r16187 MCA backtrace: execinfo (MCA v1.0, API v1.0, Component v1.2.4) MCA memory: ptmalloc2 (MCA v1.0, API v1.0, Component v1.2.4) MCA paffinity: linux (MCA v1.0, API v1.0, Component v1.2.4) MCA maffinity: first_use (MCA v1.0, API v1.0, Component v1.2.4) MCA maffinity: libnuma (MCA v1.0, API v1.0, Component v1.2.4) MCA timer: linux (MCA v1.0, API v1.0, Component v1.2.4) MCA installdirs: env (MCA v1.0, API v1.0, Component v1.2.4) MCA installdirs: config (MCA v1.0, API v1.0, Component v1.2.4) MCA allocator: basic (MCA v1.0, API v1.0, Component v1.0) MCA allocator: bucket (MCA v1.0, API v1.0, Component v1.0) MCA coll: basic (MCA v1.0, API v1.0, Component v1.2.4) MCA coll: self (MCA v1.0, API v1.0, Component v1.2.4) MCA coll: sm (MCA v1.0, API v1.0, Component v1.2.4) MCA coll: tuned (MCA v1.0, API v1.0, Component v1.2.4) MCA io: romio (MCA v1.0, API v1.0, Component v1.2.4) MCA mpool: rdma (MCA v1.0, API v1.0, Component v1.2.4) MCA mpool: sm (MCA v1.0, API v1.0, Component