Re: [gmx-users] performance issues running gromacs with more than 1 gpu card in slurm
On Tue, Jul 30, 2019 at 3:29 PM Carlos Navarro wrote: > > Hi all, > First of all, thanks for all your valuable inputs!!. > I tried Szilárd suggestion (multi simulations) with the following commands > (using a single node): > > EXE="mpirun -np 4 gmx_mpi mdrun " > > cd $WORKDIR0 > #$DO_PARALLEL > $EXE -s 4q.tpr -deffnm 4q -dlb yes -resethway -multidir 1 2 3 4 > And I noticed that the performance went from 37,32,23,22 ns/day to ~42 > ns/day in all four simulations. I check that the 80 processors were been > used a 100% of the time, while the gpu was used about a 50% (from a 70% > when running a single simulation in the node where I obtain a performance > of ~50 ns/day). Great! Note that optimizing hardware utilization doesn't always maximize performance. Also, manual launches with pinoffset/pinstride will give exactly the same performance as the multi runs *if* you get the affinities right. In your original commands you tried to use 20 of the 80 threads/rank, but you offset the runs only by 10 (hardware threads) which means that runs were overlapping and interfering with each other as well as ending up under-utilizing the hardware. > So overall I'm quite happy with the performance I'm getting now; and > honestly, I don't know if at some point I can get the same performance > (running 4 jobs) that I'm getting running just one. No, but you _may_ get a bit more aggregate performance if you run 8 concurrent jobs. Also, you cna try 1 thread per core ("mpirun -np 4 gmx mdrun_mpi -multi 4 -ntomp 10 -pin on to use only half of the threads), Cheers, -- Szilárd > Best regards, > Carlos > > —— > Carlos Navarro Retamal > Bioinformatic Engineering. PhD. > Postdoctoral Researcher in Center of Bioinformatics and Molecular > Simulations > Universidad de Talca > Av. Lircay S/N, Talca, Chile > E: carlos.navarr...@gmail.com or cnava...@utalca.cl > > On July 29, 2019 at 6:11:31 PM, Mark Abraham (mark.j.abra...@gmail.com) > wrote: > > Hi, > > Yes and the -nmpi I copied from Carlos's post is ineffective - use -ntmpi > > Mark > > > On Mon., 29 Jul. 2019, 15:15 Justin Lemkul, wrote: > > > > > > > On 7/29/19 8:46 AM, Carlos Navarro wrote: > > > Hi Mark, > > > I tried that before, but unfortunately in that case (removing > —gres=gpu:1 > > > and including in each line the -gpu_id flag) for some reason the jobs > are > > > run one at a time (one after the other), so I can’t use properly the > > whole > > > node. > > > > > > > You need to run all but the last mdrun process in the background (&). > > > > -Justin > > > > > —— > > > Carlos Navarro Retamal > > > Bioinformatic Engineering. PhD. > > > Postdoctoral Researcher in Center of Bioinformatics and Molecular > > > Simulations > > > Universidad de Talca > > > Av. Lircay S/N, Talca, Chile > > > E: carlos.navarr...@gmail.com or cnava...@utalca.cl > > > > > > On July 29, 2019 at 11:48:21 AM, Mark Abraham (mark.j.abra...@gmail.com) > > > wrote: > > > > > > Hi, > > > > > > When you use > > > > > > DO_PARALLEL=" srun --exclusive -n 1 --gres=gpu:1 " > > > > > > then the environment seems to make sure only one GPU is visible. (The > log > > > files report only finding one GPU.) But it's probably the same GPU in > > each > > > case, with three remaining idle. I would suggest not using --gres unless > > > you can specify *which* of the four available GPUs each run can use. > > > > > > Otherwise, don't use --gres and use the facilities built into GROMACS, > > e.g. > > > > > > $DO_PARALLEL $EXE -s eq6.tpr -deffnm eq6-20 -nmpi 1 -pin on -pinoffset 0 > > > -ntomp 20 -gpu_id 0 > > > $DO_PARALLEL $EXE -s eq6.tpr -deffnm eq6-20 -nmpi 1 -pin on -pinoffset > 10 > > > -ntomp 20 -gpu_id 1 > > > $DO_PARALLEL $EXE -s eq6.tpr -deffnm eq6-20 -nmpi 1 -pin on -pinoffset > 20 > > > -ntomp 20 -gpu_id 2 > > > etc. > > > > > > Mark > > > > > > On Mon, 29 Jul 2019 at 11:34, Carlos Navarro > > > > > wrote: > > > > > >> Hi Szilárd, > > >> To answer your questions: > > >> **are you trying to run multiple simulations concurrently on the same > > >> node or are you trying to strong-scale? > > >> I'm trying to run multiple simulations on the same node at the same > > time. > > >> > > >> ** what are you simulating? > > >> Regular and CompEl simulations > > >> > > >> ** can you provide log files of the runs? > > >> In the following link are some logs files: > > >> https://www.dropbox.com/s/7q249vbqqwf5r03/Archive.zip?dl=0. > > >> In short, alone.log -> single run in the node (using 1 gpu). > > >> multi1/2/3/4.log ->4 independent simulations ran at the same time in a > > >> single node. In all cases, 20 cpus are used. > > >> Best regards, > > >> Carlos > > >> > > >> El jue., 25 jul. 2019 a las 10:59, Szilárd Páll (< > > pall.szil...@gmail.com>) > > >> escribió: > > >> > > >>> Hi, > > >>> > > >>> It is not clear to me how are you trying to set up your runs, so > > >>> please provide some details: > > >>> - are you trying to run multiple simulations concurrently on the same > > >>> node or are you trying to st
Re: [gmx-users] performance issues running gromacs with more than 1 gpu card in slurm
Hi all, First of all, thanks for all your valuable inputs!!. I tried Szilárd suggestion (multi simulations) with the following commands (using a single node): EXE="mpirun -np 4 gmx_mpi mdrun " cd $WORKDIR0 #$DO_PARALLEL $EXE -s 4q.tpr -deffnm 4q -dlb yes -resethway -multidir 1 2 3 4 And I noticed that the performance went from 37,32,23,22 ns/day to ~42 ns/day in all four simulations. I check that the 80 processors were been used a 100% of the time, while the gpu was used about a 50% (from a 70% when running a single simulation in the node where I obtain a performance of ~50 ns/day). So overall I'm quite happy with the performance I'm getting now; and honestly, I don't know if at some point I can get the same performance (running 4 jobs) that I'm getting running just one. Best regards, Carlos —— Carlos Navarro Retamal Bioinformatic Engineering. PhD. Postdoctoral Researcher in Center of Bioinformatics and Molecular Simulations Universidad de Talca Av. Lircay S/N, Talca, Chile E: carlos.navarr...@gmail.com or cnava...@utalca.cl On July 29, 2019 at 6:11:31 PM, Mark Abraham (mark.j.abra...@gmail.com) wrote: Hi, Yes and the -nmpi I copied from Carlos's post is ineffective - use -ntmpi Mark On Mon., 29 Jul. 2019, 15:15 Justin Lemkul, wrote: > > > On 7/29/19 8:46 AM, Carlos Navarro wrote: > > Hi Mark, > > I tried that before, but unfortunately in that case (removing —gres=gpu:1 > > and including in each line the -gpu_id flag) for some reason the jobs are > > run one at a time (one after the other), so I can’t use properly the > whole > > node. > > > > You need to run all but the last mdrun process in the background (&). > > -Justin > > > —— > > Carlos Navarro Retamal > > Bioinformatic Engineering. PhD. > > Postdoctoral Researcher in Center of Bioinformatics and Molecular > > Simulations > > Universidad de Talca > > Av. Lircay S/N, Talca, Chile > > E: carlos.navarr...@gmail.com or cnava...@utalca.cl > > > > On July 29, 2019 at 11:48:21 AM, Mark Abraham (mark.j.abra...@gmail.com) > > wrote: > > > > Hi, > > > > When you use > > > > DO_PARALLEL=" srun --exclusive -n 1 --gres=gpu:1 " > > > > then the environment seems to make sure only one GPU is visible. (The log > > files report only finding one GPU.) But it's probably the same GPU in > each > > case, with three remaining idle. I would suggest not using --gres unless > > you can specify *which* of the four available GPUs each run can use. > > > > Otherwise, don't use --gres and use the facilities built into GROMACS, > e.g. > > > > $DO_PARALLEL $EXE -s eq6.tpr -deffnm eq6-20 -nmpi 1 -pin on -pinoffset 0 > > -ntomp 20 -gpu_id 0 > > $DO_PARALLEL $EXE -s eq6.tpr -deffnm eq6-20 -nmpi 1 -pin on -pinoffset 10 > > -ntomp 20 -gpu_id 1 > > $DO_PARALLEL $EXE -s eq6.tpr -deffnm eq6-20 -nmpi 1 -pin on -pinoffset 20 > > -ntomp 20 -gpu_id 2 > > etc. > > > > Mark > > > > On Mon, 29 Jul 2019 at 11:34, Carlos Navarro > > > wrote: > > > >> Hi Szilárd, > >> To answer your questions: > >> **are you trying to run multiple simulations concurrently on the same > >> node or are you trying to strong-scale? > >> I'm trying to run multiple simulations on the same node at the same > time. > >> > >> ** what are you simulating? > >> Regular and CompEl simulations > >> > >> ** can you provide log files of the runs? > >> In the following link are some logs files: > >> https://www.dropbox.com/s/7q249vbqqwf5r03/Archive.zip?dl=0. > >> In short, alone.log -> single run in the node (using 1 gpu). > >> multi1/2/3/4.log ->4 independent simulations ran at the same time in a > >> single node. In all cases, 20 cpus are used. > >> Best regards, > >> Carlos > >> > >> El jue., 25 jul. 2019 a las 10:59, Szilárd Páll (< > pall.szil...@gmail.com>) > >> escribió: > >> > >>> Hi, > >>> > >>> It is not clear to me how are you trying to set up your runs, so > >>> please provide some details: > >>> - are you trying to run multiple simulations concurrently on the same > >>> node or are you trying to strong-scale? > >>> - what are you simulating? > >>> - can you provide log files of the runs? > >>> > >>> Cheers, > >>> > >>> -- > >>> Szilárd > >>> > >>> On Tue, Jul 23, 2019 at 1:34 AM Carlos Navarro > >>> wrote: > No one can give me an idea of what can be happening? Or how I can > > solve > >>> it? > Best regards, > Carlos > > —— > Carlos Navarro Retamal > Bioinformatic Engineering. PhD. > Postdoctoral Researcher in Center of Bioinformatics and Molecular > Simulations > Universidad de Talca > Av. Lircay S/N, Talca, Chile > E: carlos.navarr...@gmail.com or cnava...@utalca.cl > > On July 19, 2019 at 2:20:41 PM, Carlos Navarro ( > >>> carlos.navarr...@gmail.com) > wrote: > > Dear gmx-users, > I’m currently working in a server where each node posses 40 physical > >>> cores > (40 threads) and 4 Nvidia-V100. > When I launch a single job (1 simulation using a single gpu card) I > >> get a > performance
Re: [gmx-users] performance issues running gromacs with more than 1 gpu card in slurm
Hi, Yes and the -nmpi I copied from Carlos's post is ineffective - use -ntmpi Mark On Mon., 29 Jul. 2019, 15:15 Justin Lemkul, wrote: > > > On 7/29/19 8:46 AM, Carlos Navarro wrote: > > Hi Mark, > > I tried that before, but unfortunately in that case (removing —gres=gpu:1 > > and including in each line the -gpu_id flag) for some reason the jobs are > > run one at a time (one after the other), so I can’t use properly the > whole > > node. > > > > You need to run all but the last mdrun process in the background (&). > > -Justin > > > —— > > Carlos Navarro Retamal > > Bioinformatic Engineering. PhD. > > Postdoctoral Researcher in Center of Bioinformatics and Molecular > > Simulations > > Universidad de Talca > > Av. Lircay S/N, Talca, Chile > > E: carlos.navarr...@gmail.com or cnava...@utalca.cl > > > > On July 29, 2019 at 11:48:21 AM, Mark Abraham (mark.j.abra...@gmail.com) > > wrote: > > > > Hi, > > > > When you use > > > > DO_PARALLEL=" srun --exclusive -n 1 --gres=gpu:1 " > > > > then the environment seems to make sure only one GPU is visible. (The log > > files report only finding one GPU.) But it's probably the same GPU in > each > > case, with three remaining idle. I would suggest not using --gres unless > > you can specify *which* of the four available GPUs each run can use. > > > > Otherwise, don't use --gres and use the facilities built into GROMACS, > e.g. > > > > $DO_PARALLEL $EXE -s eq6.tpr -deffnm eq6-20 -nmpi 1 -pin on -pinoffset 0 > > -ntomp 20 -gpu_id 0 > > $DO_PARALLEL $EXE -s eq6.tpr -deffnm eq6-20 -nmpi 1 -pin on -pinoffset 10 > > -ntomp 20 -gpu_id 1 > > $DO_PARALLEL $EXE -s eq6.tpr -deffnm eq6-20 -nmpi 1 -pin on -pinoffset 20 > > -ntomp 20 -gpu_id 2 > > etc. > > > > Mark > > > > On Mon, 29 Jul 2019 at 11:34, Carlos Navarro > > > wrote: > > > >> Hi Szilárd, > >> To answer your questions: > >> **are you trying to run multiple simulations concurrently on the same > >> node or are you trying to strong-scale? > >> I'm trying to run multiple simulations on the same node at the same > time. > >> > >> ** what are you simulating? > >> Regular and CompEl simulations > >> > >> ** can you provide log files of the runs? > >> In the following link are some logs files: > >> https://www.dropbox.com/s/7q249vbqqwf5r03/Archive.zip?dl=0. > >> In short, alone.log -> single run in the node (using 1 gpu). > >> multi1/2/3/4.log ->4 independent simulations ran at the same time in a > >> single node. In all cases, 20 cpus are used. > >> Best regards, > >> Carlos > >> > >> El jue., 25 jul. 2019 a las 10:59, Szilárd Páll (< > pall.szil...@gmail.com>) > >> escribió: > >> > >>> Hi, > >>> > >>> It is not clear to me how are you trying to set up your runs, so > >>> please provide some details: > >>> - are you trying to run multiple simulations concurrently on the same > >>> node or are you trying to strong-scale? > >>> - what are you simulating? > >>> - can you provide log files of the runs? > >>> > >>> Cheers, > >>> > >>> -- > >>> Szilárd > >>> > >>> On Tue, Jul 23, 2019 at 1:34 AM Carlos Navarro > >>> wrote: > No one can give me an idea of what can be happening? Or how I can > > solve > >>> it? > Best regards, > Carlos > > —— > Carlos Navarro Retamal > Bioinformatic Engineering. PhD. > Postdoctoral Researcher in Center of Bioinformatics and Molecular > Simulations > Universidad de Talca > Av. Lircay S/N, Talca, Chile > E: carlos.navarr...@gmail.com or cnava...@utalca.cl > > On July 19, 2019 at 2:20:41 PM, Carlos Navarro ( > >>> carlos.navarr...@gmail.com) > wrote: > > Dear gmx-users, > I’m currently working in a server where each node posses 40 physical > >>> cores > (40 threads) and 4 Nvidia-V100. > When I launch a single job (1 simulation using a single gpu card) I > >> get a > performance of about ~35ns/day in a system of about 300k atoms. > > Looking > into the usage of the video card during the simulation I notice that > >> the > card is being used about and ~80%. > The problems arise when I increase the number of jobs running at the > >> same > time. If for instance 2 jobs are running at the same time, the > >>> performance > drops to ~25ns/day each and the usage of the video cards also drops > >>> during > the simulation to about a ~30-40% (and sometimes dropping to less than > >>> 5%). > Clearly there is a communication problem between the gpu cards and the > >>> cpu > during the simulations, but I don’t know how to solve this. > Here is the script I use to run the simulations: > > #!/bin/bash -x > #SBATCH --job-name=testAtTPC1 > #SBATCH --ntasks-per-node=4 > #SBATCH --cpus-per-task=20 > #SBATCH --account=hdd22 > #SBATCH --nodes=1 > #SBATCH --mem=0 > #SBATCH --output=sout.%j > #SBATCH --error=s4err.%j > #SBATCH --time=00:10:00 > #SBATCH --partition=develgpus > #SBATCH --gres=gpu:4 >
Re: [gmx-users] performance issues running gromacs with more than 1 gpu card in slurm
Carlos, You can accomplish the same using the multi-simulation feature of mdrun and avoid having to manually manage the placement of runs, e.g. instead of the above you just write gmx mdrun_mpi -np N -multidir $WORKDIR1 $WORKDIR2 $WORKDIR3 ... For more details see http://manual.gromacs.org/documentation/current/user-guide/mdrun-features.html#running-multi-simulations Note that if the different runs have different speed, just as with your manual launch, your machine can end up partially utilized when some of the runs finish. Cheers, -- Szilárd On Mon, Jul 29, 2019 at 2:46 PM Carlos Navarro wrote: > > Hi Mark, > I tried that before, but unfortunately in that case (removing —gres=gpu:1 > and including in each line the -gpu_id flag) for some reason the jobs are > run one at a time (one after the other), so I can’t use properly the whole > node. > > > —— > Carlos Navarro Retamal > Bioinformatic Engineering. PhD. > Postdoctoral Researcher in Center of Bioinformatics and Molecular > Simulations > Universidad de Talca > Av. Lircay S/N, Talca, Chile > E: carlos.navarr...@gmail.com or cnava...@utalca.cl > > On July 29, 2019 at 11:48:21 AM, Mark Abraham (mark.j.abra...@gmail.com) > wrote: > > Hi, > > When you use > > DO_PARALLEL=" srun --exclusive -n 1 --gres=gpu:1 " > > then the environment seems to make sure only one GPU is visible. (The log > files report only finding one GPU.) But it's probably the same GPU in each > case, with three remaining idle. I would suggest not using --gres unless > you can specify *which* of the four available GPUs each run can use. > > Otherwise, don't use --gres and use the facilities built into GROMACS, e.g. > > $DO_PARALLEL $EXE -s eq6.tpr -deffnm eq6-20 -nmpi 1 -pin on -pinoffset 0 > -ntomp 20 -gpu_id 0 > $DO_PARALLEL $EXE -s eq6.tpr -deffnm eq6-20 -nmpi 1 -pin on -pinoffset 10 > -ntomp 20 -gpu_id 1 > $DO_PARALLEL $EXE -s eq6.tpr -deffnm eq6-20 -nmpi 1 -pin on -pinoffset 20 > -ntomp 20 -gpu_id 2 > etc. > > Mark > > On Mon, 29 Jul 2019 at 11:34, Carlos Navarro > wrote: > > > Hi Szilárd, > > To answer your questions: > > **are you trying to run multiple simulations concurrently on the same > > node or are you trying to strong-scale? > > I'm trying to run multiple simulations on the same node at the same time. > > > > ** what are you simulating? > > Regular and CompEl simulations > > > > ** can you provide log files of the runs? > > In the following link are some logs files: > > https://www.dropbox.com/s/7q249vbqqwf5r03/Archive.zip?dl=0. > > In short, alone.log -> single run in the node (using 1 gpu). > > multi1/2/3/4.log ->4 independent simulations ran at the same time in a > > single node. In all cases, 20 cpus are used. > > Best regards, > > Carlos > > > > El jue., 25 jul. 2019 a las 10:59, Szilárd Páll () > > escribió: > > > > > Hi, > > > > > > It is not clear to me how are you trying to set up your runs, so > > > please provide some details: > > > - are you trying to run multiple simulations concurrently on the same > > > node or are you trying to strong-scale? > > > - what are you simulating? > > > - can you provide log files of the runs? > > > > > > Cheers, > > > > > > -- > > > Szilárd > > > > > > On Tue, Jul 23, 2019 at 1:34 AM Carlos Navarro > > > wrote: > > > > > > > > No one can give me an idea of what can be happening? Or how I can > solve > > > it? > > > > Best regards, > > > > Carlos > > > > > > > > —— > > > > Carlos Navarro Retamal > > > > Bioinformatic Engineering. PhD. > > > > Postdoctoral Researcher in Center of Bioinformatics and Molecular > > > > Simulations > > > > Universidad de Talca > > > > Av. Lircay S/N, Talca, Chile > > > > E: carlos.navarr...@gmail.com or cnava...@utalca.cl > > > > > > > > On July 19, 2019 at 2:20:41 PM, Carlos Navarro ( > > > carlos.navarr...@gmail.com) > > > > wrote: > > > > > > > > Dear gmx-users, > > > > I’m currently working in a server where each node posses 40 physical > > > cores > > > > (40 threads) and 4 Nvidia-V100. > > > > When I launch a single job (1 simulation using a single gpu card) I > > get a > > > > performance of about ~35ns/day in a system of about 300k atoms. > Looking > > > > into the usage of the video card during the simulation I notice that > > the > > > > card is being used about and ~80%. > > > > The problems arise when I increase the number of jobs running at the > > same > > > > time. If for instance 2 jobs are running at the same time, the > > > performance > > > > drops to ~25ns/day each and the usage of the video cards also drops > > > during > > > > the simulation to about a ~30-40% (and sometimes dropping to less than > > > 5%). > > > > Clearly there is a communication problem between the gpu cards and the > > > cpu > > > > during the simulations, but I don’t know how to solve this. > > > > Here is the script I use to run the simulations: > > > > > > > > #!/bin/bash -x > > > > #SBATCH --job-name=testAtTPC1 > > > > #SBATCH --ntasks-per-node=4 > > > > #SBATCH --cpus-per-task=20 > > > > #SBATCH
Re: [gmx-users] performance issues running gromacs with more than 1 gpu card in slurm
On 7/29/19 8:46 AM, Carlos Navarro wrote: Hi Mark, I tried that before, but unfortunately in that case (removing —gres=gpu:1 and including in each line the -gpu_id flag) for some reason the jobs are run one at a time (one after the other), so I can’t use properly the whole node. You need to run all but the last mdrun process in the background (&). -Justin —— Carlos Navarro Retamal Bioinformatic Engineering. PhD. Postdoctoral Researcher in Center of Bioinformatics and Molecular Simulations Universidad de Talca Av. Lircay S/N, Talca, Chile E: carlos.navarr...@gmail.com or cnava...@utalca.cl On July 29, 2019 at 11:48:21 AM, Mark Abraham (mark.j.abra...@gmail.com) wrote: Hi, When you use DO_PARALLEL=" srun --exclusive -n 1 --gres=gpu:1 " then the environment seems to make sure only one GPU is visible. (The log files report only finding one GPU.) But it's probably the same GPU in each case, with three remaining idle. I would suggest not using --gres unless you can specify *which* of the four available GPUs each run can use. Otherwise, don't use --gres and use the facilities built into GROMACS, e.g. $DO_PARALLEL $EXE -s eq6.tpr -deffnm eq6-20 -nmpi 1 -pin on -pinoffset 0 -ntomp 20 -gpu_id 0 $DO_PARALLEL $EXE -s eq6.tpr -deffnm eq6-20 -nmpi 1 -pin on -pinoffset 10 -ntomp 20 -gpu_id 1 $DO_PARALLEL $EXE -s eq6.tpr -deffnm eq6-20 -nmpi 1 -pin on -pinoffset 20 -ntomp 20 -gpu_id 2 etc. Mark On Mon, 29 Jul 2019 at 11:34, Carlos Navarro wrote: Hi Szilárd, To answer your questions: **are you trying to run multiple simulations concurrently on the same node or are you trying to strong-scale? I'm trying to run multiple simulations on the same node at the same time. ** what are you simulating? Regular and CompEl simulations ** can you provide log files of the runs? In the following link are some logs files: https://www.dropbox.com/s/7q249vbqqwf5r03/Archive.zip?dl=0. In short, alone.log -> single run in the node (using 1 gpu). multi1/2/3/4.log ->4 independent simulations ran at the same time in a single node. In all cases, 20 cpus are used. Best regards, Carlos El jue., 25 jul. 2019 a las 10:59, Szilárd Páll () escribió: Hi, It is not clear to me how are you trying to set up your runs, so please provide some details: - are you trying to run multiple simulations concurrently on the same node or are you trying to strong-scale? - what are you simulating? - can you provide log files of the runs? Cheers, -- Szilárd On Tue, Jul 23, 2019 at 1:34 AM Carlos Navarro wrote: No one can give me an idea of what can be happening? Or how I can solve it? Best regards, Carlos —— Carlos Navarro Retamal Bioinformatic Engineering. PhD. Postdoctoral Researcher in Center of Bioinformatics and Molecular Simulations Universidad de Talca Av. Lircay S/N, Talca, Chile E: carlos.navarr...@gmail.com or cnava...@utalca.cl On July 19, 2019 at 2:20:41 PM, Carlos Navarro ( carlos.navarr...@gmail.com) wrote: Dear gmx-users, I’m currently working in a server where each node posses 40 physical cores (40 threads) and 4 Nvidia-V100. When I launch a single job (1 simulation using a single gpu card) I get a performance of about ~35ns/day in a system of about 300k atoms. Looking into the usage of the video card during the simulation I notice that the card is being used about and ~80%. The problems arise when I increase the number of jobs running at the same time. If for instance 2 jobs are running at the same time, the performance drops to ~25ns/day each and the usage of the video cards also drops during the simulation to about a ~30-40% (and sometimes dropping to less than 5%). Clearly there is a communication problem between the gpu cards and the cpu during the simulations, but I don’t know how to solve this. Here is the script I use to run the simulations: #!/bin/bash -x #SBATCH --job-name=testAtTPC1 #SBATCH --ntasks-per-node=4 #SBATCH --cpus-per-task=20 #SBATCH --account=hdd22 #SBATCH --nodes=1 #SBATCH --mem=0 #SBATCH --output=sout.%j #SBATCH --error=s4err.%j #SBATCH --time=00:10:00 #SBATCH --partition=develgpus #SBATCH --gres=gpu:4 module use /gpfs/software/juwels/otherstages module load Stages/2018b module load Intel/2019.0.117-GCC-7.3.0 module load IntelMPI/2019.0.117 module load GROMACS/2018.3 WORKDIR1=/p/project/chdd22/gromacs/benchmark/AtTPC1/singlegpu/1 WORKDIR2=/p/project/chdd22/gromacs/benchmark/AtTPC1/singlegpu/2 WORKDIR3=/p/project/chdd22/gromacs/benchmark/AtTPC1/singlegpu/3 WORKDIR4=/p/project/chdd22/gromacs/benchmark/AtTPC1/singlegpu/4 DO_PARALLEL=" srun --exclusive -n 1 --gres=gpu:1 " EXE=" gmx mdrun " cd $WORKDIR1 $DO_PARALLEL $EXE -s eq6.tpr -deffnm eq6-20 -nmpi 1 -pin on -pinoffset 0 -ntomp 20 &>log & cd $WORKDIR2 $DO_PARALLEL $EXE -s eq6.tpr -deffnm eq6-20 -nmpi 1 -pin on -pinoffset 10 -ntomp 20 &>log & cd $WORKDIR3 $DO_PARALLEL $EXE -s eq6.tpr -deffnm eq6-20 -nmpi 1 -pin on -pinoffset 20 -ntomp 20 &>log & cd $WORKDIR4 $DO_PARALLEL $EXE -s eq6.tpr -deffnm eq6-20 -nmpi
Re: [gmx-users] performance issues running gromacs with more than 1 gpu card in slurm
Hi Mark, I tried that before, but unfortunately in that case (removing —gres=gpu:1 and including in each line the -gpu_id flag) for some reason the jobs are run one at a time (one after the other), so I can’t use properly the whole node. —— Carlos Navarro Retamal Bioinformatic Engineering. PhD. Postdoctoral Researcher in Center of Bioinformatics and Molecular Simulations Universidad de Talca Av. Lircay S/N, Talca, Chile E: carlos.navarr...@gmail.com or cnava...@utalca.cl On July 29, 2019 at 11:48:21 AM, Mark Abraham (mark.j.abra...@gmail.com) wrote: Hi, When you use DO_PARALLEL=" srun --exclusive -n 1 --gres=gpu:1 " then the environment seems to make sure only one GPU is visible. (The log files report only finding one GPU.) But it's probably the same GPU in each case, with three remaining idle. I would suggest not using --gres unless you can specify *which* of the four available GPUs each run can use. Otherwise, don't use --gres and use the facilities built into GROMACS, e.g. $DO_PARALLEL $EXE -s eq6.tpr -deffnm eq6-20 -nmpi 1 -pin on -pinoffset 0 -ntomp 20 -gpu_id 0 $DO_PARALLEL $EXE -s eq6.tpr -deffnm eq6-20 -nmpi 1 -pin on -pinoffset 10 -ntomp 20 -gpu_id 1 $DO_PARALLEL $EXE -s eq6.tpr -deffnm eq6-20 -nmpi 1 -pin on -pinoffset 20 -ntomp 20 -gpu_id 2 etc. Mark On Mon, 29 Jul 2019 at 11:34, Carlos Navarro wrote: > Hi Szilárd, > To answer your questions: > **are you trying to run multiple simulations concurrently on the same > node or are you trying to strong-scale? > I'm trying to run multiple simulations on the same node at the same time. > > ** what are you simulating? > Regular and CompEl simulations > > ** can you provide log files of the runs? > In the following link are some logs files: > https://www.dropbox.com/s/7q249vbqqwf5r03/Archive.zip?dl=0. > In short, alone.log -> single run in the node (using 1 gpu). > multi1/2/3/4.log ->4 independent simulations ran at the same time in a > single node. In all cases, 20 cpus are used. > Best regards, > Carlos > > El jue., 25 jul. 2019 a las 10:59, Szilárd Páll () > escribió: > > > Hi, > > > > It is not clear to me how are you trying to set up your runs, so > > please provide some details: > > - are you trying to run multiple simulations concurrently on the same > > node or are you trying to strong-scale? > > - what are you simulating? > > - can you provide log files of the runs? > > > > Cheers, > > > > -- > > Szilárd > > > > On Tue, Jul 23, 2019 at 1:34 AM Carlos Navarro > > wrote: > > > > > > No one can give me an idea of what can be happening? Or how I can solve > > it? > > > Best regards, > > > Carlos > > > > > > —— > > > Carlos Navarro Retamal > > > Bioinformatic Engineering. PhD. > > > Postdoctoral Researcher in Center of Bioinformatics and Molecular > > > Simulations > > > Universidad de Talca > > > Av. Lircay S/N, Talca, Chile > > > E: carlos.navarr...@gmail.com or cnava...@utalca.cl > > > > > > On July 19, 2019 at 2:20:41 PM, Carlos Navarro ( > > carlos.navarr...@gmail.com) > > > wrote: > > > > > > Dear gmx-users, > > > I’m currently working in a server where each node posses 40 physical > > cores > > > (40 threads) and 4 Nvidia-V100. > > > When I launch a single job (1 simulation using a single gpu card) I > get a > > > performance of about ~35ns/day in a system of about 300k atoms. Looking > > > into the usage of the video card during the simulation I notice that > the > > > card is being used about and ~80%. > > > The problems arise when I increase the number of jobs running at the > same > > > time. If for instance 2 jobs are running at the same time, the > > performance > > > drops to ~25ns/day each and the usage of the video cards also drops > > during > > > the simulation to about a ~30-40% (and sometimes dropping to less than > > 5%). > > > Clearly there is a communication problem between the gpu cards and the > > cpu > > > during the simulations, but I don’t know how to solve this. > > > Here is the script I use to run the simulations: > > > > > > #!/bin/bash -x > > > #SBATCH --job-name=testAtTPC1 > > > #SBATCH --ntasks-per-node=4 > > > #SBATCH --cpus-per-task=20 > > > #SBATCH --account=hdd22 > > > #SBATCH --nodes=1 > > > #SBATCH --mem=0 > > > #SBATCH --output=sout.%j > > > #SBATCH --error=s4err.%j > > > #SBATCH --time=00:10:00 > > > #SBATCH --partition=develgpus > > > #SBATCH --gres=gpu:4 > > > > > > module use /gpfs/software/juwels/otherstages > > > module load Stages/2018b > > > module load Intel/2019.0.117-GCC-7.3.0 > > > module load IntelMPI/2019.0.117 > > > module load GROMACS/2018.3 > > > > > > WORKDIR1=/p/project/chdd22/gromacs/benchmark/AtTPC1/singlegpu/1 > > > WORKDIR2=/p/project/chdd22/gromacs/benchmark/AtTPC1/singlegpu/2 > > > WORKDIR3=/p/project/chdd22/gromacs/benchmark/AtTPC1/singlegpu/3 > > > WORKDIR4=/p/project/chdd22/gromacs/benchmark/AtTPC1/singlegpu/4 > > > > > > DO_PARALLEL=" srun --exclusive -n 1 --gres=gpu:1 " > > > EXE=" gmx mdrun " > > > > > > cd $WORKDIR1 > > > $DO_PARALLEL $EXE -s eq6.tpr -deff
Re: [gmx-users] performance issues running gromacs with more than 1 gpu card in slurm
Hi, When you use DO_PARALLEL=" srun --exclusive -n 1 --gres=gpu:1 " then the environment seems to make sure only one GPU is visible. (The log files report only finding one GPU.) But it's probably the same GPU in each case, with three remaining idle. I would suggest not using --gres unless you can specify *which* of the four available GPUs each run can use. Otherwise, don't use --gres and use the facilities built into GROMACS, e.g. $DO_PARALLEL $EXE -s eq6.tpr -deffnm eq6-20 -nmpi 1 -pin on -pinoffset 0 -ntomp 20 -gpu_id 0 $DO_PARALLEL $EXE -s eq6.tpr -deffnm eq6-20 -nmpi 1 -pin on -pinoffset 10 -ntomp 20 -gpu_id 1 $DO_PARALLEL $EXE -s eq6.tpr -deffnm eq6-20 -nmpi 1 -pin on -pinoffset 20 -ntomp 20 -gpu_id 2 etc. Mark On Mon, 29 Jul 2019 at 11:34, Carlos Navarro wrote: > Hi Szilárd, > To answer your questions: > **are you trying to run multiple simulations concurrently on the same > node or are you trying to strong-scale? > I'm trying to run multiple simulations on the same node at the same time. > > ** what are you simulating? > Regular and CompEl simulations > > ** can you provide log files of the runs? > In the following link are some logs files: > https://www.dropbox.com/s/7q249vbqqwf5r03/Archive.zip?dl=0. > In short, alone.log -> single run in the node (using 1 gpu). > multi1/2/3/4.log ->4 independent simulations ran at the same time in a > single node. In all cases, 20 cpus are used. > Best regards, > Carlos > > El jue., 25 jul. 2019 a las 10:59, Szilárd Páll () > escribió: > > > Hi, > > > > It is not clear to me how are you trying to set up your runs, so > > please provide some details: > > - are you trying to run multiple simulations concurrently on the same > > node or are you trying to strong-scale? > > - what are you simulating? > > - can you provide log files of the runs? > > > > Cheers, > > > > -- > > Szilárd > > > > On Tue, Jul 23, 2019 at 1:34 AM Carlos Navarro > > wrote: > > > > > > No one can give me an idea of what can be happening? Or how I can solve > > it? > > > Best regards, > > > Carlos > > > > > > —— > > > Carlos Navarro Retamal > > > Bioinformatic Engineering. PhD. > > > Postdoctoral Researcher in Center of Bioinformatics and Molecular > > > Simulations > > > Universidad de Talca > > > Av. Lircay S/N, Talca, Chile > > > E: carlos.navarr...@gmail.com or cnava...@utalca.cl > > > > > > On July 19, 2019 at 2:20:41 PM, Carlos Navarro ( > > carlos.navarr...@gmail.com) > > > wrote: > > > > > > Dear gmx-users, > > > I’m currently working in a server where each node posses 40 physical > > cores > > > (40 threads) and 4 Nvidia-V100. > > > When I launch a single job (1 simulation using a single gpu card) I > get a > > > performance of about ~35ns/day in a system of about 300k atoms. Looking > > > into the usage of the video card during the simulation I notice that > the > > > card is being used about and ~80%. > > > The problems arise when I increase the number of jobs running at the > same > > > time. If for instance 2 jobs are running at the same time, the > > performance > > > drops to ~25ns/day each and the usage of the video cards also drops > > during > > > the simulation to about a ~30-40% (and sometimes dropping to less than > > 5%). > > > Clearly there is a communication problem between the gpu cards and the > > cpu > > > during the simulations, but I don’t know how to solve this. > > > Here is the script I use to run the simulations: > > > > > > #!/bin/bash -x > > > #SBATCH --job-name=testAtTPC1 > > > #SBATCH --ntasks-per-node=4 > > > #SBATCH --cpus-per-task=20 > > > #SBATCH --account=hdd22 > > > #SBATCH --nodes=1 > > > #SBATCH --mem=0 > > > #SBATCH --output=sout.%j > > > #SBATCH --error=s4err.%j > > > #SBATCH --time=00:10:00 > > > #SBATCH --partition=develgpus > > > #SBATCH --gres=gpu:4 > > > > > > module use /gpfs/software/juwels/otherstages > > > module load Stages/2018b > > > module load Intel/2019.0.117-GCC-7.3.0 > > > module load IntelMPI/2019.0.117 > > > module load GROMACS/2018.3 > > > > > > WORKDIR1=/p/project/chdd22/gromacs/benchmark/AtTPC1/singlegpu/1 > > > WORKDIR2=/p/project/chdd22/gromacs/benchmark/AtTPC1/singlegpu/2 > > > WORKDIR3=/p/project/chdd22/gromacs/benchmark/AtTPC1/singlegpu/3 > > > WORKDIR4=/p/project/chdd22/gromacs/benchmark/AtTPC1/singlegpu/4 > > > > > > DO_PARALLEL=" srun --exclusive -n 1 --gres=gpu:1 " > > > EXE=" gmx mdrun " > > > > > > cd $WORKDIR1 > > > $DO_PARALLEL $EXE -s eq6.tpr -deffnm eq6-20 -nmpi 1 -pin on -pinoffset > 0 > > > -ntomp 20 &>log & > > > cd $WORKDIR2 > > > $DO_PARALLEL $EXE -s eq6.tpr -deffnm eq6-20 -nmpi 1 -pin on -pinoffset > 10 > > > -ntomp 20 &>log & > > > cd $WORKDIR3 > > > $DO_PARALLEL $EXE -s eq6.tpr -deffnm eq6-20 -nmpi 1 -pin on -pinoffset > > 20 > > > -ntomp 20 &>log & > > > cd $WORKDIR4 > > > $DO_PARALLEL $EXE -s eq6.tpr -deffnm eq6-20 -nmpi 1 -pin on -pinoffset > 30 > > > -ntomp 20 &>log & > > > > > > > > > Regarding to pinoffset, I first tried using 20 cores for each job but > > then > > > also tried with 8
Re: [gmx-users] performance issues running gromacs with more than 1 gpu card in slurm
Hi Szilárd, To answer your questions: **are you trying to run multiple simulations concurrently on the same node or are you trying to strong-scale? I'm trying to run multiple simulations on the same node at the same time. ** what are you simulating? Regular and CompEl simulations ** can you provide log files of the runs? In the following link are some logs files: https://www.dropbox.com/s/7q249vbqqwf5r03/Archive.zip?dl=0. In short, alone.log -> single run in the node (using 1 gpu). multi1/2/3/4.log ->4 independent simulations ran at the same time in a single node. In all cases, 20 cpus are used. Best regards, Carlos El jue., 25 jul. 2019 a las 10:59, Szilárd Páll () escribió: > Hi, > > It is not clear to me how are you trying to set up your runs, so > please provide some details: > - are you trying to run multiple simulations concurrently on the same > node or are you trying to strong-scale? > - what are you simulating? > - can you provide log files of the runs? > > Cheers, > > -- > Szilárd > > On Tue, Jul 23, 2019 at 1:34 AM Carlos Navarro > wrote: > > > > No one can give me an idea of what can be happening? Or how I can solve > it? > > Best regards, > > Carlos > > > > —— > > Carlos Navarro Retamal > > Bioinformatic Engineering. PhD. > > Postdoctoral Researcher in Center of Bioinformatics and Molecular > > Simulations > > Universidad de Talca > > Av. Lircay S/N, Talca, Chile > > E: carlos.navarr...@gmail.com or cnava...@utalca.cl > > > > On July 19, 2019 at 2:20:41 PM, Carlos Navarro ( > carlos.navarr...@gmail.com) > > wrote: > > > > Dear gmx-users, > > I’m currently working in a server where each node posses 40 physical > cores > > (40 threads) and 4 Nvidia-V100. > > When I launch a single job (1 simulation using a single gpu card) I get a > > performance of about ~35ns/day in a system of about 300k atoms. Looking > > into the usage of the video card during the simulation I notice that the > > card is being used about and ~80%. > > The problems arise when I increase the number of jobs running at the same > > time. If for instance 2 jobs are running at the same time, the > performance > > drops to ~25ns/day each and the usage of the video cards also drops > during > > the simulation to about a ~30-40% (and sometimes dropping to less than > 5%). > > Clearly there is a communication problem between the gpu cards and the > cpu > > during the simulations, but I don’t know how to solve this. > > Here is the script I use to run the simulations: > > > > #!/bin/bash -x > > #SBATCH --job-name=testAtTPC1 > > #SBATCH --ntasks-per-node=4 > > #SBATCH --cpus-per-task=20 > > #SBATCH --account=hdd22 > > #SBATCH --nodes=1 > > #SBATCH --mem=0 > > #SBATCH --output=sout.%j > > #SBATCH --error=s4err.%j > > #SBATCH --time=00:10:00 > > #SBATCH --partition=develgpus > > #SBATCH --gres=gpu:4 > > > > module use /gpfs/software/juwels/otherstages > > module load Stages/2018b > > module load Intel/2019.0.117-GCC-7.3.0 > > module load IntelMPI/2019.0.117 > > module load GROMACS/2018.3 > > > > WORKDIR1=/p/project/chdd22/gromacs/benchmark/AtTPC1/singlegpu/1 > > WORKDIR2=/p/project/chdd22/gromacs/benchmark/AtTPC1/singlegpu/2 > > WORKDIR3=/p/project/chdd22/gromacs/benchmark/AtTPC1/singlegpu/3 > > WORKDIR4=/p/project/chdd22/gromacs/benchmark/AtTPC1/singlegpu/4 > > > > DO_PARALLEL=" srun --exclusive -n 1 --gres=gpu:1 " > > EXE=" gmx mdrun " > > > > cd $WORKDIR1 > > $DO_PARALLEL $EXE -s eq6.tpr -deffnm eq6-20 -nmpi 1 -pin on -pinoffset 0 > > -ntomp 20 &>log & > > cd $WORKDIR2 > > $DO_PARALLEL $EXE -s eq6.tpr -deffnm eq6-20 -nmpi 1 -pin on -pinoffset 10 > > -ntomp 20 &>log & > > cd $WORKDIR3 > > $DO_PARALLEL $EXE -s eq6.tpr -deffnm eq6-20 -nmpi 1 -pin on -pinoffset > 20 > > -ntomp 20 &>log & > > cd $WORKDIR4 > > $DO_PARALLEL $EXE -s eq6.tpr -deffnm eq6-20 -nmpi 1 -pin on -pinoffset 30 > > -ntomp 20 &>log & > > > > > > Regarding to pinoffset, I first tried using 20 cores for each job but > then > > also tried with 8 cores (so pinoffset 0 for job 1, pinoffset 4 for job 2, > > pinoffset 8 for job 3 and pinoffset 12 for job) but at the end the > problem > > persist. > > > > Currently in this machine I’m not able to use more than 1 gpu per job, so > > this is my only choice to use properly the whole node. > > If you need more information please just let me know. > > Best regards. > > Carlos > > > > —— > > Carlos Navarro Retamal > > Bioinformatic Engineering. PhD. > > Postdoctoral Researcher in Center of Bioinformatics and Molecular > > Simulations > > Universidad de Talca > > Av. Lircay S/N, Talca, Chile > > E: carlos.navarr...@gmail.com or cnava...@utalca.cl > > -- > > Gromacs Users mailing list > > > > * Please search the archive at > http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before > posting! > > > > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists > > > > * For (un)subscribe requests visit > > https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or > send a mail to gmx-users-requ...@
Re: [gmx-users] performance issues running gromacs with more than 1 gpu card in slurm
Hi, It is not clear to me how are you trying to set up your runs, so please provide some details: - are you trying to run multiple simulations concurrently on the same node or are you trying to strong-scale? - what are you simulating? - can you provide log files of the runs? Cheers, -- Szilárd On Tue, Jul 23, 2019 at 1:34 AM Carlos Navarro wrote: > > No one can give me an idea of what can be happening? Or how I can solve it? > Best regards, > Carlos > > —— > Carlos Navarro Retamal > Bioinformatic Engineering. PhD. > Postdoctoral Researcher in Center of Bioinformatics and Molecular > Simulations > Universidad de Talca > Av. Lircay S/N, Talca, Chile > E: carlos.navarr...@gmail.com or cnava...@utalca.cl > > On July 19, 2019 at 2:20:41 PM, Carlos Navarro (carlos.navarr...@gmail.com) > wrote: > > Dear gmx-users, > I’m currently working in a server where each node posses 40 physical cores > (40 threads) and 4 Nvidia-V100. > When I launch a single job (1 simulation using a single gpu card) I get a > performance of about ~35ns/day in a system of about 300k atoms. Looking > into the usage of the video card during the simulation I notice that the > card is being used about and ~80%. > The problems arise when I increase the number of jobs running at the same > time. If for instance 2 jobs are running at the same time, the performance > drops to ~25ns/day each and the usage of the video cards also drops during > the simulation to about a ~30-40% (and sometimes dropping to less than 5%). > Clearly there is a communication problem between the gpu cards and the cpu > during the simulations, but I don’t know how to solve this. > Here is the script I use to run the simulations: > > #!/bin/bash -x > #SBATCH --job-name=testAtTPC1 > #SBATCH --ntasks-per-node=4 > #SBATCH --cpus-per-task=20 > #SBATCH --account=hdd22 > #SBATCH --nodes=1 > #SBATCH --mem=0 > #SBATCH --output=sout.%j > #SBATCH --error=s4err.%j > #SBATCH --time=00:10:00 > #SBATCH --partition=develgpus > #SBATCH --gres=gpu:4 > > module use /gpfs/software/juwels/otherstages > module load Stages/2018b > module load Intel/2019.0.117-GCC-7.3.0 > module load IntelMPI/2019.0.117 > module load GROMACS/2018.3 > > WORKDIR1=/p/project/chdd22/gromacs/benchmark/AtTPC1/singlegpu/1 > WORKDIR2=/p/project/chdd22/gromacs/benchmark/AtTPC1/singlegpu/2 > WORKDIR3=/p/project/chdd22/gromacs/benchmark/AtTPC1/singlegpu/3 > WORKDIR4=/p/project/chdd22/gromacs/benchmark/AtTPC1/singlegpu/4 > > DO_PARALLEL=" srun --exclusive -n 1 --gres=gpu:1 " > EXE=" gmx mdrun " > > cd $WORKDIR1 > $DO_PARALLEL $EXE -s eq6.tpr -deffnm eq6-20 -nmpi 1 -pin on -pinoffset 0 > -ntomp 20 &>log & > cd $WORKDIR2 > $DO_PARALLEL $EXE -s eq6.tpr -deffnm eq6-20 -nmpi 1 -pin on -pinoffset 10 > -ntomp 20 &>log & > cd $WORKDIR3 > $DO_PARALLEL $EXE -s eq6.tpr -deffnm eq6-20 -nmpi 1 -pin on -pinoffset 20 > -ntomp 20 &>log & > cd $WORKDIR4 > $DO_PARALLEL $EXE -s eq6.tpr -deffnm eq6-20 -nmpi 1 -pin on -pinoffset 30 > -ntomp 20 &>log & > > > Regarding to pinoffset, I first tried using 20 cores for each job but then > also tried with 8 cores (so pinoffset 0 for job 1, pinoffset 4 for job 2, > pinoffset 8 for job 3 and pinoffset 12 for job) but at the end the problem > persist. > > Currently in this machine I’m not able to use more than 1 gpu per job, so > this is my only choice to use properly the whole node. > If you need more information please just let me know. > Best regards. > Carlos > > —— > Carlos Navarro Retamal > Bioinformatic Engineering. PhD. > Postdoctoral Researcher in Center of Bioinformatics and Molecular > Simulations > Universidad de Talca > Av. Lircay S/N, Talca, Chile > E: carlos.navarr...@gmail.com or cnava...@utalca.cl > -- > Gromacs Users mailing list > > * Please search the archive at > http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting! > > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists > > * For (un)subscribe requests visit > https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a > mail to gmx-users-requ...@gromacs.org. -- Gromacs Users mailing list * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting! * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists * For (un)subscribe requests visit https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a mail to gmx-users-requ...@gromacs.org.
Re: [gmx-users] performance issues running gromacs with more than 1 gpu card in slurm
No one can give me an idea of what can be happening? Or how I can solve it? Best regards, Carlos —— Carlos Navarro Retamal Bioinformatic Engineering. PhD. Postdoctoral Researcher in Center of Bioinformatics and Molecular Simulations Universidad de Talca Av. Lircay S/N, Talca, Chile E: carlos.navarr...@gmail.com or cnava...@utalca.cl On July 19, 2019 at 2:20:41 PM, Carlos Navarro (carlos.navarr...@gmail.com) wrote: Dear gmx-users, I’m currently working in a server where each node posses 40 physical cores (40 threads) and 4 Nvidia-V100. When I launch a single job (1 simulation using a single gpu card) I get a performance of about ~35ns/day in a system of about 300k atoms. Looking into the usage of the video card during the simulation I notice that the card is being used about and ~80%. The problems arise when I increase the number of jobs running at the same time. If for instance 2 jobs are running at the same time, the performance drops to ~25ns/day each and the usage of the video cards also drops during the simulation to about a ~30-40% (and sometimes dropping to less than 5%). Clearly there is a communication problem between the gpu cards and the cpu during the simulations, but I don’t know how to solve this. Here is the script I use to run the simulations: #!/bin/bash -x #SBATCH --job-name=testAtTPC1 #SBATCH --ntasks-per-node=4 #SBATCH --cpus-per-task=20 #SBATCH --account=hdd22 #SBATCH --nodes=1 #SBATCH --mem=0 #SBATCH --output=sout.%j #SBATCH --error=s4err.%j #SBATCH --time=00:10:00 #SBATCH --partition=develgpus #SBATCH --gres=gpu:4 module use /gpfs/software/juwels/otherstages module load Stages/2018b module load Intel/2019.0.117-GCC-7.3.0 module load IntelMPI/2019.0.117 module load GROMACS/2018.3 WORKDIR1=/p/project/chdd22/gromacs/benchmark/AtTPC1/singlegpu/1 WORKDIR2=/p/project/chdd22/gromacs/benchmark/AtTPC1/singlegpu/2 WORKDIR3=/p/project/chdd22/gromacs/benchmark/AtTPC1/singlegpu/3 WORKDIR4=/p/project/chdd22/gromacs/benchmark/AtTPC1/singlegpu/4 DO_PARALLEL=" srun --exclusive -n 1 --gres=gpu:1 " EXE=" gmx mdrun " cd $WORKDIR1 $DO_PARALLEL $EXE -s eq6.tpr -deffnm eq6-20 -nmpi 1 -pin on -pinoffset 0 -ntomp 20 &>log & cd $WORKDIR2 $DO_PARALLEL $EXE -s eq6.tpr -deffnm eq6-20 -nmpi 1 -pin on -pinoffset 10 -ntomp 20 &>log & cd $WORKDIR3 $DO_PARALLEL $EXE -s eq6.tpr -deffnm eq6-20 -nmpi 1 -pin on -pinoffset 20 -ntomp 20 &>log & cd $WORKDIR4 $DO_PARALLEL $EXE -s eq6.tpr -deffnm eq6-20 -nmpi 1 -pin on -pinoffset 30 -ntomp 20 &>log & Regarding to pinoffset, I first tried using 20 cores for each job but then also tried with 8 cores (so pinoffset 0 for job 1, pinoffset 4 for job 2, pinoffset 8 for job 3 and pinoffset 12 for job) but at the end the problem persist. Currently in this machine I’m not able to use more than 1 gpu per job, so this is my only choice to use properly the whole node. If you need more information please just let me know. Best regards. Carlos —— Carlos Navarro Retamal Bioinformatic Engineering. PhD. Postdoctoral Researcher in Center of Bioinformatics and Molecular Simulations Universidad de Talca Av. Lircay S/N, Talca, Chile E: carlos.navarr...@gmail.com or cnava...@utalca.cl -- Gromacs Users mailing list * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting! * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists * For (un)subscribe requests visit https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a mail to gmx-users-requ...@gromacs.org.