Re: [OMPI users] Using mtrace with openmpi segfaults

2007-12-06 Thread Sajjad Tabib
Hi,

Is it possible to disable ptmalloc2 at runtime by disabling the component? 


Thanks,

Sajjad Tabib




Jeff Squyres <jsquy...@cisco.com> 
Sent by: users-boun...@open-mpi.org
12/06/07 07:44 AM
Please respond to
Open MPI Users <us...@open-mpi.org>


To
Open MPI Users <us...@open-mpi.org>
cc

Subject
Re: [OMPI users] Using mtrace with openmpi segfaults






I have not tried to use mtrace myself.  But I can see how it would be 
problematic with OMPI's internal use of ptmalloc2.  If you are not 
using InfiniBand or Myrinet over GM, you don't need OMPI to have an 
internal copy of ptmalloc2.  You can disable OMPI's ptmalloc2 by 
configuring with:

   ./configure --without-memory-manager


On Dec 3, 2007, at 6:23 PM, Jeffrey M Ceason wrote:

>
> Having trouble using mtrace with openmpi.  Whenever I use the mtrace 
> call before or after MPI_Init the application terminates.  This only 
> seems to happen using mpi.  Is there a way to disable the open-mpi 
> memory wrappers?  Is there known issues with users applications 
> using mallopts and the mallopts used by open-mpi?
>
> Machine is AMD64 Fedora Core 7
> [ceason@n01-044-0 minib]$ uname -a
> Linux n01-044-0 2.6.22-rr #1 SMP Fri Nov 16 15:28:53 CST 2007 x86_64 
> x86_64 x86_64 GNU/Linux
> [ceason@n01-044-0 minib]$
>
>
> Test source.
> #include 
> #include 
> #include 
> #include 
> #include 
> #include 
> #include 
>
> using namespace std;
>
> int main (int argc,char * argv[]) {
> mtrace();
> MPI_Init(NULL,NULL);
>
>MPI_Finalize();
> }
>
>
> [ceason@n01-044-0 minib]$ mpiCC dacs_test.cc -o trace_test
> [ceason@n01-044-0 minib]$ mpirun -np 1 trace_test
> mpirun noticed that job rank 0 with PID 7078 on node n01-044-0 
> exited on signal 8 (Floating point exception).
> [ceason@n01-044-0 minib]$
>
>
> backtrace of core
>
> Core was generated by `trace_test'.
> Program terminated with signal 8, Arithmetic exception.
> #0  0x2b33169d4abc in sYSTRIm () from /lib64/libc.so.6
> (gdb) bt
> #0  0x2b33169d4abc in sYSTRIm () from /lib64/libc.so.6
> #1  0x2b33169d71c2 in _int_free () from /lib64/libc.so.6
> #2  0x2b33169dab1c in free () from /lib64/libc.so.6
> #3  0x2b33169dcee8 in tr_freehook () from /lib64/libc.so.6
> #4  0x2b33157674f3 in free () from /usr/local/openmpi/lib64/ 
> libopen-pal.so.0
> #5  0x2b33169ceaf1 in vasprintf () from /lib64/libc.so.6
> #6  0x2b33169b3088 in asprintf () from /lib64/libc.so.6
> #7  0x2b3315760c7d in opal_output_init () from /usr/local/ 
> openmpi/lib64/libopen-pal.so.0
> #8  0x2b3315760a2a in do_open () from /usr/local/openmpi/lib64/ 
> libopen-pal.so.0
> #9  0x2b331575f958 in opal_malloc_init () from /usr/local/ 
> openmpi/lib64/libopen-pal.so.0
> #10 0x2b331574ac27 in opal_init_util () from /usr/local/openmpi/ 
> lib64/libopen-pal.so.0
> #11 0x2b331574ad06 in opal_init () from /usr/local/openmpi/lib64/ 
> libopen-pal.so.0
> #12 0x2b3315283edf in ompi_mpi_init () from /usr/local/openmpi/ 
> lib64/libmpi.so.0
> #13 0x2b33152a54f0 in PMPI_Init () from /usr/local/openmpi/lib64/ 
> libmpi.so.0
> #14 0x00408397 in main ()
> (gdb)
>
> Shouldn't involve communications between machines but here is the IB 
> Info.
>
> [ceason@n01-044-0 minib]$ ibv_devinfo
> hca_id: mlx4_0
> fw_ver: 2.2.000
> node_guid:  0002:c903::17d0
> sys_image_guid: 0002:c903::17d3
> vendor_id:  0x02c9
> vendor_part_id: 25418
> hw_ver: 0xA0
> board_id:   MT_04A0110002
> phys_port_cnt:  2
> port:   1
> state:  PORT_ACTIVE (4)
> max_mtu:2048 (4)
> active_mtu: 2048 (4)
> sm_lid: 1
> port_lid:   8
> port_lmc:   0x00
>
> port:   2
> state:  PORT_DOWN (1)
> max_mtu:2048 (4)
> active_mtu: 2048 (4)
> sm_lid: 0
> port_lid:   0
> port_lmc:   0x00
>
> [ceason@n01-044-0 minib]$
>
> [ceason@n01-044-0 minib]$ ulimit -l
> unlimited
>
>
>
>
> ompi_info -all output
>
> Open MPI: 1.2.4
>Open MPI SVN revision: r16187
>  

Re: [OMPI users] Open MPI can't open PML cm

2007-10-23 Thread Sajjad Tabib
Hi George,

Thanks for your response.
I found a bug in my MTL code that had propagated up to PML which was 
causing that error. 

Sajjad Tabib



Message: 2
List-Post: users@lists.open-mpi.org
Date: Wed, 17 Oct 2007 12:24:53 -0400
From: George Bosilca <bosi...@eecs.utk.edu>
Subject: Re: [OMPI users] Open MPI can't open PML cm
To: Open MPI Users <us...@open-mpi.org>
Message-ID: <8561cfaa-764a-4c61-a6b1-cdc74f1cd...@eecs.utk.edu>
Content-Type: text/plain; charset="us-ascii"

The CM PML only support networks that do matching in hardware. In 
Open MPI temrs, the CM PML require one of the MTL (instead of the BTL 
for all others PML). For a full list of supported networks, one can 
list the content of the ompi/mca/mtl directory (right now: Myrinet, 
Portals and QSLogic).

If your environment do not dispose of any of these networks, then CM 
cannot be used. Moreover, as you force the PML to CM on the command 
line, as CM fails to load, Open MPI give up claiming that no PML were 
found.

   Thanks,
 george.

On Oct 17, 2007, at 12:02 PM, Sajjad Tabib wrote:

>
> Hi,
>
> I am trying to use the cm component from pml, but when I execute 
> the command: "mpirun -np 2 --mca pml cm ompi_test", I get the error 
> message that "No available pml components were found". I did a 
> ompi_info to see if the cm component exist, and it does. The output 
> of ">ompi_info | grep cm" was "MCA pml: cm (MCA v1.0, API v1.0, 
> Component v1.3)". I have also set my  LD_LIBRARY_PATH as instructed 
> by the FAQs. I have even reconfigured and rebuilt open-mpi, but 
> that didn't fix the problem either. I am wondering whether a 
> process on my system could interfere with opening the cm component. 
> I don't know the answer to this, but thought that I should throw it 
> out there.
> Anyways, I am not sure what to do next to troubleshoot this problem 
> and was hoping that somebody could give me pointers on what might 
> be wrong or what I could check/do next.
>
> Thank You,
>
> Sajjad Tabib
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users


[OMPI users] Open MPI can't open PML cm

2007-10-17 Thread Sajjad Tabib
Hi,

I am trying to use the cm component from pml, but when I execute the 
command: "mpirun -np 2 --mca pml cm ompi_test", I get the error message 
that "No available pml components were found". I did a ompi_info to see if 
the cm component exist, and it does. The output of ">ompi_info | grep cm" 
was "MCA pml: cm (MCA v1.0, API v1.0, Component v1.3)". I have also set my 
 LD_LIBRARY_PATH as instructed by the FAQs. I have even reconfigured and 
rebuilt open-mpi, but that didn't fix the problem either. I am wondering 
whether a process on my system could interfere with opening the cm 
component. I don't know the answer to this, but thought that I should 
throw it out there.
Anyways, I am not sure what to do next to troubleshoot this problem and 
was hoping that somebody could give me pointers on what might be wrong or 
what I could check/do next.

Thank You, 

Sajjad Tabib