What version of OMPI are you using? On Nov 26, 2012, at 1:02 AM, George Markomanolis <geo...@markomanolis.com> wrote:
> Dear all, > > Initially I would like an advice of how to identify the maximum number of MPI > processes that can be executed on a node with oversubscribing. When I try to > execute an application with 4096 MPI processes on a 24-cores node with 48GB > of memory, I have an error "Unknown error: 1" while the memory is not even at > the half. I can execute the same application with 2048 MPI processes in less > than one minute. I have checked linux settings about maximum number of > processes and it is much bigger than 4096. > > Another more generic question, is about discovering nodes with faulty memory. > Is there any way to identify nodes with faulty memory? I found accidentally > that a node with exact the same hardware couldn't execute an MPI application > when it was using more than 12GB of ram while the second one could use all of > the 48GB of memory. If I have 500+ nodes is difficult to check all of them > and I am not familiar with any efficient solution. Initially I thought about > memtester but it takes a lot of time. I know that this does not apply exactly > on this mailing list but I thought that maybe an OpenMPI user knows something > about. > > > Best regards, > George Markomanolis > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users