Re: [OMPI users] Using mtrace with openmpi segfaults
Hi, Is it possible to disable ptmalloc2 at runtime by disabling the component? Thanks, Sajjad Tabib Jeff Squyres <jsquy...@cisco.com> Sent by: users-boun...@open-mpi.org 12/06/07 07:44 AM Please respond to Open MPI Users <us...@open-mpi.org> To Open MPI Users <us...@open-mpi.org> cc Subject Re: [OMPI users] Using mtrace with openmpi segfaults I have not tried to use mtrace myself. But I can see how it would be problematic with OMPI's internal use of ptmalloc2. If you are not using InfiniBand or Myrinet over GM, you don't need OMPI to have an internal copy of ptmalloc2. You can disable OMPI's ptmalloc2 by configuring with: ./configure --without-memory-manager On Dec 3, 2007, at 6:23 PM, Jeffrey M Ceason wrote: > > Having trouble using mtrace with openmpi. Whenever I use the mtrace > call before or after MPI_Init the application terminates. This only > seems to happen using mpi. Is there a way to disable the open-mpi > memory wrappers? Is there known issues with users applications > using mallopts and the mallopts used by open-mpi? > > Machine is AMD64 Fedora Core 7 > [ceason@n01-044-0 minib]$ uname -a > Linux n01-044-0 2.6.22-rr #1 SMP Fri Nov 16 15:28:53 CST 2007 x86_64 > x86_64 x86_64 GNU/Linux > [ceason@n01-044-0 minib]$ > > > Test source. > #include > #include > #include > #include > #include > #include > #include > > using namespace std; > > int main (int argc,char * argv[]) { > mtrace(); > MPI_Init(NULL,NULL); > >MPI_Finalize(); > } > > > [ceason@n01-044-0 minib]$ mpiCC dacs_test.cc -o trace_test > [ceason@n01-044-0 minib]$ mpirun -np 1 trace_test > mpirun noticed that job rank 0 with PID 7078 on node n01-044-0 > exited on signal 8 (Floating point exception). > [ceason@n01-044-0 minib]$ > > > backtrace of core > > Core was generated by `trace_test'. > Program terminated with signal 8, Arithmetic exception. > #0 0x2b33169d4abc in sYSTRIm () from /lib64/libc.so.6 > (gdb) bt > #0 0x2b33169d4abc in sYSTRIm () from /lib64/libc.so.6 > #1 0x2b33169d71c2 in _int_free () from /lib64/libc.so.6 > #2 0x2b33169dab1c in free () from /lib64/libc.so.6 > #3 0x2b33169dcee8 in tr_freehook () from /lib64/libc.so.6 > #4 0x2b33157674f3 in free () from /usr/local/openmpi/lib64/ > libopen-pal.so.0 > #5 0x2b33169ceaf1 in vasprintf () from /lib64/libc.so.6 > #6 0x2b33169b3088 in asprintf () from /lib64/libc.so.6 > #7 0x2b3315760c7d in opal_output_init () from /usr/local/ > openmpi/lib64/libopen-pal.so.0 > #8 0x2b3315760a2a in do_open () from /usr/local/openmpi/lib64/ > libopen-pal.so.0 > #9 0x2b331575f958 in opal_malloc_init () from /usr/local/ > openmpi/lib64/libopen-pal.so.0 > #10 0x2b331574ac27 in opal_init_util () from /usr/local/openmpi/ > lib64/libopen-pal.so.0 > #11 0x2b331574ad06 in opal_init () from /usr/local/openmpi/lib64/ > libopen-pal.so.0 > #12 0x2b3315283edf in ompi_mpi_init () from /usr/local/openmpi/ > lib64/libmpi.so.0 > #13 0x2b33152a54f0 in PMPI_Init () from /usr/local/openmpi/lib64/ > libmpi.so.0 > #14 0x00408397 in main () > (gdb) > > Shouldn't involve communications between machines but here is the IB > Info. > > [ceason@n01-044-0 minib]$ ibv_devinfo > hca_id: mlx4_0 > fw_ver: 2.2.000 > node_guid: 0002:c903::17d0 > sys_image_guid: 0002:c903::17d3 > vendor_id: 0x02c9 > vendor_part_id: 25418 > hw_ver: 0xA0 > board_id: MT_04A0110002 > phys_port_cnt: 2 > port: 1 > state: PORT_ACTIVE (4) > max_mtu:2048 (4) > active_mtu: 2048 (4) > sm_lid: 1 > port_lid: 8 > port_lmc: 0x00 > > port: 2 > state: PORT_DOWN (1) > max_mtu:2048 (4) > active_mtu: 2048 (4) > sm_lid: 0 > port_lid: 0 > port_lmc: 0x00 > > [ceason@n01-044-0 minib]$ > > [ceason@n01-044-0 minib]$ ulimit -l > unlimited > > > > > ompi_info -all output > > Open MPI: 1.2.4 >Open MPI SVN revision: r16187 >
Re: [OMPI users] Open MPI can't open PML cm
Hi George, Thanks for your response. I found a bug in my MTL code that had propagated up to PML which was causing that error. Sajjad Tabib Message: 2 List-Post: users@lists.open-mpi.org Date: Wed, 17 Oct 2007 12:24:53 -0400 From: George Bosilca <bosi...@eecs.utk.edu> Subject: Re: [OMPI users] Open MPI can't open PML cm To: Open MPI Users <us...@open-mpi.org> Message-ID: <8561cfaa-764a-4c61-a6b1-cdc74f1cd...@eecs.utk.edu> Content-Type: text/plain; charset="us-ascii" The CM PML only support networks that do matching in hardware. In Open MPI temrs, the CM PML require one of the MTL (instead of the BTL for all others PML). For a full list of supported networks, one can list the content of the ompi/mca/mtl directory (right now: Myrinet, Portals and QSLogic). If your environment do not dispose of any of these networks, then CM cannot be used. Moreover, as you force the PML to CM on the command line, as CM fails to load, Open MPI give up claiming that no PML were found. Thanks, george. On Oct 17, 2007, at 12:02 PM, Sajjad Tabib wrote: > > Hi, > > I am trying to use the cm component from pml, but when I execute > the command: "mpirun -np 2 --mca pml cm ompi_test", I get the error > message that "No available pml components were found". I did a > ompi_info to see if the cm component exist, and it does. The output > of ">ompi_info | grep cm" was "MCA pml: cm (MCA v1.0, API v1.0, > Component v1.3)". I have also set my LD_LIBRARY_PATH as instructed > by the FAQs. I have even reconfigured and rebuilt open-mpi, but > that didn't fix the problem either. I am wondering whether a > process on my system could interfere with opening the cm component. > I don't know the answer to this, but thought that I should throw it > out there. > Anyways, I am not sure what to do next to troubleshoot this problem > and was hoping that somebody could give me pointers on what might > be wrong or what I could check/do next. > > Thank You, > > Sajjad Tabib > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users
[OMPI users] Open MPI can't open PML cm
Hi, I am trying to use the cm component from pml, but when I execute the command: "mpirun -np 2 --mca pml cm ompi_test", I get the error message that "No available pml components were found". I did a ompi_info to see if the cm component exist, and it does. The output of ">ompi_info | grep cm" was "MCA pml: cm (MCA v1.0, API v1.0, Component v1.3)". I have also set my LD_LIBRARY_PATH as instructed by the FAQs. I have even reconfigured and rebuilt open-mpi, but that didn't fix the problem either. I am wondering whether a process on my system could interfere with opening the cm component. I don't know the answer to this, but thought that I should throw it out there. Anyways, I am not sure what to do next to troubleshoot this problem and was hoping that somebody could give me pointers on what might be wrong or what I could check/do next. Thank You, Sajjad Tabib