Brian,

To close this one off, we found that one of our libraries has a malloc/free that was being called from ompi. I should have looked at the crash reporter. It reported

Exception:  EXC_BAD_ACCESS (0x0001)
Codes:      KERN_INVALID_ADDRESS (0x0001) at 0x05801bfc

Thread 0 Crashed:
0   libcasa_casa.dylib          0x0107b319 free + 51
1 libopen-pal.0.dylib 0x0289eff9 opal_install_dirs_expand + 467 (installdirs_base_expand.c:68) 2 libopen-pal.0.dylib 0x0289e5a0 opal_installdirs_base_open + 1115 (installdirs_base_components.c:96) 3 libopen-pal.0.dylib 0x0287ba40 opal_init_util + 217 (opal_init.c: 150)
4   libopen-pal.0.dylib         0x0287bb24 opal_init + 24 (opal_init.c:200)
5 libmpi.0.dylib 0x01d745cd ompi_mpi_init + 33 (ompi_mpi_init.c:219)
6   libmpi.0.dylib              0x01db48db MPI_Init + 293 (init.c:71)
7   ctest                       0x00002f90 main + 24 (ctest.cc:4)
8   ctest                       0x00002906 _start + 216
9   ctest                       0x0000282d start + 41

On looking into this more, we found that the Lea Malloc was used in the casa_casa library. Removing it cured the problem.

Thanks for the help,

Tim

On 12/07/2007, at 2:54 PM, Tim Cornwell wrote:


Brian,

I think it's just a symbol clash. A test program linked with just mpicxx works fine but with our typical link, it fails. I've narrowed the problem down to a single shared library. This is from C ++ and the symbols have a namespace casa. Weeding out all the the casa stuff and some other cruft, we're left with:

0009df14 T QuantaProxy::fits()
0011277c S int __gnu_cxx::__capture_isnan<double>(double)
0014b4ae S std::invalid_argument::~invalid_argument()
0014b48e S std::invalid_argument::~invalid_argument()
00112790 S int std::isnan<double>(double)
001200e8 S void** std::fill_n<void**, unsigned int, void*>(void**, unsigned int, void* const&) 0012da12 S std::complex<double>* std::fill_n<std::complex<double>*, unsigned int, std::complex<double> >(std::complex<double>*, unsigned int, std::complex<double> const&) 0012d9ae S std::complex<float>* std::fill_n<std::complex<float>*, unsigned int, std::complex<float> >(std::complex<float>*, unsigned int, std::complex<float> const&) 00104a4c S bool* std::fill_n<bool*, unsigned int, bool>(bool*, unsigned int, bool const&) 0010b126 S double* std::fill_n<double*, unsigned int, double> (double*, unsigned int, double const&) 0012043a S float* std::fill_n<float*, unsigned int, float>(float*, unsigned int, float const&) 00120386 S int* std::fill_n<int*, unsigned int, int>(int*, unsigned int, int const&) 001203e0 S unsigned int* std::fill_n<unsigned int*, unsigned int, unsigned int>(unsigned int*, unsigned int, unsigned int const&) 00120322 S short* std::fill_n<short*, unsigned int, short>(short*, unsigned int, short const&) 0012d94a S unsigned short* std::fill_n<unsigned short*, unsigned int, unsigned short>(unsigned short*, unsigned int, unsigned short const&) 00112bf6 S void std::__reverse<__gnu_cxx::__normal_iterator<char*, std::basic_string<char, std::char_traits<char>, std::allocator<char> > > >(__gnu_cxx::__normal_iterator<char*, std::basic_string<char, std::char_traits<char>, std::allocator<char> > >, __gnu_cxx::__normal_iterator<char*, std::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::random_access_iterator_tag) 00112bbc S __gnu_cxx::__normal_iterator<char*, std::basic_string<char, std::char_traits<char>, std::allocator<char> > > std::transform<__gnu_cxx::__normal_iterator<char*, std::basic_string<char, std::char_traits<char>, std::allocator<char> > >, __gnu_cxx::__normal_iterator<char*, std::basic_string<char, std::char_traits<char>, std::allocator<char> > >, int (*)(int)> (__gnu_cxx::__normal_iterator<char*, std::basic_string<char, std::char_traits<char>, std::allocator<char> > >, __gnu_cxx::__normal_iterator<char*, std::basic_string<char, std::char_traits<char>, std::allocator<char> > >, __gnu_cxx::__normal_iterator<char*, std::basic_string<char, std::char_traits<char>, std::allocator<char> > >, int (*)(int))
00198740 S typeinfo for std::invalid_argument
00192cac S typeinfo name for std::invalid_argument
001993e0 S vtable for std::invalid_argument


We're all using the standard of OS X:

$ mpicxx -v
Using built-in specs.
Target: i686-apple-darwin8
Configured with: /private/var/tmp/gcc/gcc-5367.obj~1/src/configure --disable-checking -enable-werror --prefix=/usr --mandir=/share/man --enable-languages=c,objc,c++,obj-c++ --program-transform-name=/^ [cg][^.-]*$/s/$/-4.0/ --with-gxx-include-dir=/include/c++/4.0.0 -- with-slibdir=/usr/lib --build=powerpc-apple-darwin8 --with- arch=nocona --with-tune=generic --program-prefix= --host=i686-apple- darwin8 --target=i686-apple-darwin8
Thread model: posix
gcc version 4.0.1 (Apple Computer, Inc. build 5367)

Tim



On 12/07/2007, at 7:57 AM, Brian Barrett wrote:

That's unexpected.  If you run the command 'ompi_info --all', it
should list (towards the top) things like the Bindir and Libdir.  Can
you see if those have sane values?  If they do, can you try running a
simple hello, world type MPI application (there's one in the OMPI
tarball).  It almost looks like memory is getting corrupted, which
would be very unexpected that early in the process.  I'm unable to
duplicate the problem with 1.2.3 on my Mac Pro, making it all the
more strange.

Another random thought -- Which compilers did you use to build Open MPI?

Brian


On Jul 11, 2007, at 1:27 PM, Tim Cornwell wrote:


                 Open MPI: 1.2.3
    Open MPI SVN revision: r15136
                 Open RTE: 1.2.3
    Open RTE SVN revision: r15136
                     OPAL: 1.2.3
        OPAL SVN revision: r15136
                   Prefix: /usr/local
  Configured architecture: i386-apple-darwin8.10.1

Hi Brian,

1.2.3 downloaded and built from source.

Tim

On 12/07/2007, at 12:50 AM, Brian Barrett wrote:

Which version of Open MPI are you using?

Thanks,

Brian

On Jul 11, 2007, at 3:32 AM, Tim Cornwell wrote:


I have a problem running openmpi under OS 10.4.10. My program runs
fine under debian x86_64 on an opteron but under OS X on a number
of Mac Book and Mac Book Pros, I get the following immediately on
startup. This smells like a common problem but I could find
anything relevant anywhere. Can anyone provide a hint or better yet
a solution?

Thanks,

Tim


Program received signal EXC_BAD_ACCESS, Could not access memory.
Reason: KERN_PROTECTION_FAILURE at address: 0x0000000c
0x04510412 in free ()
(gdb) where
#0  0x04510412 in free ()
#1  0x05d24f80 in opal_install_dirs_expand (input=0x5d2a6b0 "$
{prefix}") at base/installdirs_base_expand.c:67
#2  0x05d24584 in opal_installdirs_base_open () at base/
installdirs_base_components.c:94
#3  0x05d01a40 in opal_init_util () at runtime/opal_init.c:150
#4  0x05d01b24 in opal_init () at runtime/opal_init.c:200
#5  0x051fa5cd in ompi_mpi_init (argc=1, argv=0xbfffde74,
requested=0, provided=0xbfffd930) at runtime/ompi_mpi_init.c:219
#6  0x0523a8db in MPI_Init (argc=0xbfffd980, argv=0xbfffde14) at
init.c:71
#7  0x0005a03d in conrad::cp::MPIConnection::initMPI (argc=1,
argv=@0xbfffde14) at mwcommon/MPIConnection.cc:83
#8 0x00004163 in main (argc=1, argv=0xbfffde74) at apps/ cimager.cc:
155


------------------------------------------------------------------ --
-
-
----------
Tim Cornwell,  Australia Telescope National Facility, CSIRO
Location: Cnr Pembroke & Vimiera Rds, Marsfield, NSW, 2122,
AUSTRALIA
Post:     PO Box 76, Epping, NSW 1710, AUSTRALIA
Phone:    +61 2 9372 4261   Fax:  +61 2 9372 4450 or 4310
Mobile:  +61 4 3366 5399
Email:    tim.cornw...@csiro.au
URL:      http://www.atnf.csiro.au/people/tim.cornwell
------------------------------------------------------------------ --
-
-
-----------



_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users
_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

Reply via email to