Have you run your application through a debugger, or examined the corefiles to see where exactly the segv is occurring? That may shed some insight into what the exact problem is.
On Dec 16, 2010, at 4:20 AM, Vaz, Guilherme wrote: > Ok, ok. It is indeed a CFD program, and Gus got it right. Number of cells per > core means memory per core (sorry for the inaccuracy). > My PC has 12GB of RAM. And the same calculation runs fine in an old > Ubuntu8.04 32bits with 4GB RAM. > What I find strange is that the same problems runs with 1 core (without > evoking mpiexec) and then for large number of cores/processes, for instance > mpiexec -n 32. Something in between not. And it is not a bug in the program > because it runs in other machines and the code has not been changed. > > Anymore hints? > > Thanks in advance. > > Guilherme > > > > > dr. ir. Guilherme Vaz > CFD Researcher > Research & Development > E mailto:g....@marin.nl > T +31 317 49 33 25 > > MARIN > 2, Haagsteeg, P.O. Box 28, 6700 AA Wageningen, The Netherlands > T +31 317 49 39 11, F +31 317 49 32 45, I www.marin.nl > > -----Original Message----- > From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On > Behalf Of Gus Correa > Sent: Thursday, December 16, 2010 12:46 AM > To: Open MPI Users > Subject: Re: [OMPI users] segmentation fault > > Maybe a CFD jargon? > Perhaps the number (not size) of cells in a mesh/grid being handled > by each core/cpu? > > Ralph Castain wrote: >> I have no idea what you mean by "cell sizes per core". Certainly not any >> terminology within OMPI... >> >> >> On Dec 15, 2010, at 3:47 PM, Vaz, Guilherme wrote: >> >>> >>> Dear all, >>> >>> I have a problem with openmpi1.3, ifort+mkl v11.1 in Ubuntu10.04 >>> systems (32 or 64bit). My code worked in Ubuntu8.04 and works in >>> RedHat based systems, with slightly different version changes on mkl >>> and ifort. There were no changes in the source code. >>> The problem is that the application works for small cell sizes per >>> core, but not for large cell sizes per core. And it always works for 1 >>> core. >>> Example: a grid with 1.2Million cells does not work with mpiexec -n 4 >>> <my_app> but it works with mpiexec -n 32 <my_app>. It seems that there >>> is a maximum of cell/core. And it works with <my_app>. >>> >>> Is this a stack size (or any memory problem)? Should I set the ulimit >>> -s unlimited not only on my bashrc but also in the ssh environment >>> (and how)? Or is something else? >>> Any clues/tips? >>> >>> Thanks for any help. >>> >>> Gui >>> >>> >>> >>> >>> <imagec393d1.JPG><image4c4685.JPG> >>> >>> dr. ir. Guilherme Vaz >>> >>> CFD Researcher >>> >>> >>> Research & Development >>> >>> >>> >>> >>> >>> *MARIN* >>> >>> >>> >>> >>> >>> 2, Haagsteeg >>> E g....@marin.nl <mailto:g....@marin.nl> P.O. Box 28 T +31 317 49 >>> 39 11 >>> 6700 AA Wageningen F +31 317 49 32 45 >>> T +31 317 49 33 25 The Netherlands I www.marin.nl <http://www.marin.nl> >>> >>> >>> _______________________________________________ >>> users mailing list >>> us...@open-mpi.org <mailto:us...@open-mpi.org> >>> http://www.open-mpi.org/mailman/listinfo.cgi/users >> >> >> ------------------------------------------------------------------------ >> >> _______________________________________________ >> users mailing list >> us...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/users > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/