Re: [OMPI users] Seg fault in opal_progress

2018-07-12 Thread Jeff Squyres (jsquyres) via users
Noam and I actually talked on the phone (whtt!?) and worked through this a bit more. Oddly, he can generate core files if he runs in /tmp, but not if he runs in an NFS-mounted directory (!). I haven't seen that before -- if someone knows why that would happen, I'd love to hear the explanat

Re: [OMPI users] Seg fault in opal_progress

2018-07-12 Thread Noam Bernstein
> On Jul 12, 2018, at 11:58 AM, Jeff Squyres (jsquyres) > wrote: > > > > (You may have already done this; I just want to make sure we're on the same > sheet of music here…) I’m not talking about the job script or shell startup files. The actual “executable” passed to mpirun on the command

Re: [OMPI users] Seg fault in opal_progress

2018-07-12 Thread Jeff Squyres (jsquyres) via users
On Jul 12, 2018, at 11:45 AM, Noam Bernstein wrote: > >> E.g., if you "ulimit -c" in your interactive shell and see "unlimited", but >> if you "ulimit -c" in a launched job and see "0", then the job scheduler is >> doing that to your environment somewhere. > > I am using a scheduler (torque),

Re: [OMPI users] Seg fault in opal_progress

2018-07-12 Thread Noam Bernstein
> On Jul 12, 2018, at 11:02 AM, Jeff Squyres (jsquyres) > wrote: > > On Jul 12, 2018, at 10:59 AM, Noam Bernstein > wrote: >> >>> Do you get core files? >>> >>> Loading up the core file in a debugger might give us more information. >> >> No, I don’t, despite setting "ulimit -c unlimited”

Re: [OMPI users] Seg fault in opal_progress

2018-07-12 Thread Jeff Squyres (jsquyres) via users
On Jul 12, 2018, at 10:59 AM, Noam Bernstein wrote: > >> Do you get core files? >> >> Loading up the core file in a debugger might give us more information. > > No, I don’t, despite setting "ulimit -c unlimited”. I’m not sure what’s > going on with that (or the lack of line info in the sta

Re: [OMPI users] Seg fault in opal_progress

2018-07-12 Thread Noam Bernstein
> On Jul 12, 2018, at 10:51 AM, Jeff Squyres (jsquyres) via users > wrote: > > Do you get core files? > > Loading up the core file in a debugger might give us more information. No, I don’t, despite setting "ulimit -c unlimited”. I’m not sure what’s going on with that (or the lack of line i

Re: [OMPI users] Seg fault in opal_progress

2018-07-12 Thread Jeff Squyres (jsquyres) via users
Do you get core files? Loading up the core file in a debugger might give us more information. > On Jul 12, 2018, at 9:35 AM, Noam Bernstein > wrote: > > >> On Jul 12, 2018, at 8:37 AM, Noam Bernstein >> wrote: >> >> I’m going to try the 3.1.x 20180710 nightly snapshot next. > > Same be

Re: [OMPI users] Seg fault in opal_progress

2018-07-12 Thread Noam Bernstein
> On Jul 12, 2018, at 8:37 AM, Noam Bernstein > wrote: > > I’m going to try the 3.1.x 20180710 nightly snapshot next. Same behavior, exactly - segfault, no debugging info beyond the vasp routine that calls mpi_allreduce.

Re: [OMPI users] Seg fault in opal_progress

2018-07-12 Thread Noam Bernstein
I’ve recompiled 3.1.1 with —enable-debug —enable-mem-debug, and I still get no detailed information from the mpi libraries, only VASP (as before): ldd (at runtime, so I’m fairly sure it’s referring to the right executable and LD_LIBRARY_PATH) info: vexec /usr/local/vasp/bin/5.4.4/0test/vasp.gamm

Re: [OMPI users] Seg fault in opal_progress

2018-07-12 Thread Åke Sandgren
Are you running with ulimit -s unlimited? If not that looks like a out-of-stack crash, which VASP frequently causes. If you are running with unlimited stack, I could perhaps run that input case on our VASP build. (Which have a bunch of fixes for bad stack usage among other things) On 07/11/2018 1