This usually means that you have a memory error of some kind in your application.
Have you tried running your application through a memory-checking debugger, such as valgrind? On Sep 5, 2011, at 3:48 AM, Jai Dayal wrote: > Hi all, > I've been beating my head on this for quite a while now. I don't have this > problem when running with 1,2, or 3 procs, however, once I get to 4 or > beyond, I have a problem. > > When I call malloc, I get this error: > > txserver:11055 terminated with signal 11 at PC=2b46886bc18a SP=7fff20f51030. > Backtrace: > /apps/x86_64/mpi/openmpi/gcc-4.3.4/openmpi-1.4.3_oobpr/lib/libopen-pal.so.0(opal_memory_ptmalloc2_int_malloc+0x54a)[0x2b46886bc18a] > /apps/x86_64/mpi/openmpi/gcc-4.3.4/openmpi-1.4.3_oobpr/lib/libopen-pal.so.0[0x2b46886bd4f3] > txserver[0x415769] > txserver[0x40da8c] > txserver[0x4344bb] > txserver[0x4351cd] > txserver[0x40e3d4] > /lib64/libc.so.6(__libc_start_main+0xf4)[0x2b468a5af994] > txserver(_ZNSt8ios_base4InitD1Ev+0x39)[0x40c889] > > However, only rank 0 calls this function. Which is strange. I can just put > a dummy malloc in there (int * dummy = (int *)malloc(10);) for example, and > it will still crash. > > Again, this does not happen with n < 4 procs. The crash happens on rank 0, > as it's the only rank that calls this code... > > I'm perplexed. > > Thanks a lot, > J.D. > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/