Re: [OMPI users] job termination

2019-04-17 Thread John Hearns via users
I would do the normal things. Log into those nodes. Run dmesg and look at /var/log/messages Look at the Slurm log on the node and look for the job ending. Also look at the sysstat files and see if there was a lot of memory being used http://sebastien.godard.pagesperso-orange.fr/ On Wed, 17 Apr

[OMPI users] job termination

2019-04-17 Thread Mahmood Naderan
Hi, A QuantumEspresso, multinode and multiprocess MPI job has been terminated with the following messages in the log file total cpu time spent up to now is63540.4 secs total energy = -14004.61932175 Ry Harris-Foulkes estimate = -14004.73511665 Ry

Re: [OMPI users] job termination on grid

2013-04-30 Thread Ralph Castain
On Apr 30, 2013, at 1:54 PM, Vladimir Yamshchikov wrote: > This is the question I am trying to answer - how many threads I can use with > blastx on a grid? If I could request resources by_node, use -pernode option > to have one process per node, and then specify the correct

Re: [OMPI users] job termination on grid

2013-04-30 Thread Vladimir Yamshchikov
This is the question I am trying to answer - how many threads I can use with blastx on a grid? If I could request resources by_node, use -pernode option to have one process per node, and then specify the correct number of threads for each node. But I cannot, resurces (slots) are requested per-core

Re: [OMPI users] job termination on grid

2013-04-30 Thread Ralph Castain
On Apr 30, 2013, at 1:34 PM, Vladimir Yamshchikov wrote: > I asked grid IT and they said they had to kill it as the job was overloading > nodes. They saw loads up to 180 instead of close to 12 on 12-core nodes. They > think that blastx is not an openmpi application, so

Re: [OMPI users] job termination on grid

2013-04-30 Thread Vladimir Yamshchikov
I asked grid IT and they said they had to kill it as the job was overloading nodes. They saw loads up to 180 instead of close to 12 on 12-core nodes. They think that blastx is not an openmpi application, so openMPI is spawning between 64-96 blastx processes, each of which is then starting up 96

Re: [OMPI users] job termination on grid

2013-04-30 Thread Reuti
Hi, Am 30.04.2013 um 21:26 schrieb Vladimir Yamshchikov: > My recent job started normally but after a few hours of running died with the > following message: > > -- > A daemon (pid 19390) died unexpectedly with status 137

[OMPI users] job termination on grid

2013-04-30 Thread Vladimir Yamshchikov
Hello, My recent job started normally but after a few hours of running died with the following message: -- A daemon (pid 19390) died unexpectedly with status 137 while attempting to launch so we are aborting. There