Re: [OMPI users] running OpenMPI jobs (either 1.10.1 or 1.8.7) on SoGE more problems

2016-03-17 Thread Dave Love
Ralph Castain writes: > That’s an SGE error message - looks like your tmp file system on one > of the remote nodes is full. Yes; surely that just needs to be fixed, and I'd expect the host not to accept jobs in that state. It's not just going to break ompi. > We don’t

Re: [OMPI users] running OpenMPI jobs (either 1.10.1 or 1.8.7) on SoGE more problems

2016-03-16 Thread Ralph Castain
That’s an SGE error message - looks like your tmp file system on one of the remote nodes is full. We don’t control where SGE puts its files, but it might be that your backend nodes are having issues with us doing a tree-based launch (i.e., where each backend daemon launches more daemons along

[OMPI users] running OpenMPI jobs (either 1.10.1 or 1.8.7) on SoGE more problems

2016-03-16 Thread Lane, William
I'm getting an error message early on: [csclprd3-0-11:17355] [[36373,0],17] plm:rsh: using "/opt/sge/bin/lx-amd64/qrsh -inherit -nostdin -V -verbose" for launching unable to write to file /tmp/285019.1.verylong.q/qrsh_error: No space left on device[csclprd3-6-10:18352] [[36373,0],21] plm:rsh: