Re: [OMPI users] OpenMPI 1.8.6, CentOS 6.3, too many slots = crash

2015-06-19 Thread Ralph Castain
Good point William: can you rebuild OMPI with —enable-debug and run this again so we can see where the code is breaking? Thanks Ralph > On Jun 19, 2015, at 6:11 AM, Gilles Gouaillardet > wrote: > > Ralph, > > I got that, but I cannot read the stack trace (optimized build) > my best bet is

Re: [OMPI users] OpenMPI 1.8.6, CentOS 6.3, too many slots = crash

2015-06-19 Thread Gilles Gouaillardet
Ralph, I got that, but I cannot read the stack trace (optimized build) my best bet is to reproduce the issue, and then find how and why ompi_free_list_t is segfault'ing. that's why I requested info about the environment iirc, ompi_free_list_t are different between master and v1.8, so an incorrect

Re: [OMPI users] OpenMPI 1.8.6, CentOS 6.3, too many slots = crash

2015-06-19 Thread Ralph Castain
Gilles I was fooled too, but that isn’t the issue. The problem is that ompi_free_list is segfaulting: > [csclprd3-0-13:30901] *** Process received signal *** > [csclprd3-0-13:30901] Signal: Bus error (7) > [csclprd3-0-13:30901] Signal code: Non-existant physical address (2) > [csclprd3-0-13:3090

Re: [OMPI users] OpenMPI 1.8.6, CentOS 6.3, too many slots = crash

2015-06-19 Thread Gilles Gouaillardet
Lane, could you please describe your configuration ? how many sockets per node ? how many cores per socket ? how many threads per core ? what is the minimum number of nodes needed to reproduce the issue ? do all the nodes have the same configuration ? if yes, what happens without --hetero-nodes ?

Re: [OMPI users] OpenMPI 1.8.6, CentOS 6.3, too many slots = crash

2015-06-19 Thread Ralph Castain
Ah crud - my bad for not looking closely enough at your original backtrace. I’m so used to seeing these issues as having to do with binding when I see that hwthreads-as-cpus flag :-) This has nothing to do with binding etc - something appears wrong in the ompi_free_list code. I’ll have to defer

Re: [OMPI users] Fwd[2]: OMPI yalla vs impi

2015-06-19 Thread Timur Ismagilov
Hello, Alina! I use "OSU MPI Multiple Bandwidth / Message Rate Test v4.4.1". I downloaded it from the website: http://mvapich.cse.ohio-state.edu/benchmarks/ I have attached "osu_mbw_mr.c" to this letter. Best regards, Timur Четверг, 18 июня 2015, 18:23 +03:00 от Alina Sklarevich : >Hi Timur, >

Re: [OMPI users] OpenMPI 1.8.6, CentOS 6.3, too many slots = crash

2015-06-19 Thread Lane, William
Ralph, I created a hostfile that just has the names of the hosts while specifying no slot information whatsoever (e.g. csclprd3-0-0) and received the following errors: mpirun -np 132 -report-bindings --prefix /hpc/apps/mpi/openmpi/1.8.6/ --hostfile hostfile-noslots --mca btl_tcp_if_include eth0