Re: [gmx-users] performance issues running gromacs with more than 1 gpu card in slurm

2019-07-30 Thread Szilárd Páll
On Tue, Jul 30, 2019 at 3:29 PM Carlos Navarro
 wrote:
>
> Hi all,
> First of all, thanks for all your valuable inputs!!.
> I tried Szilárd suggestion (multi simulations) with the following commands
> (using a single node):
>
> EXE="mpirun -np 4 gmx_mpi mdrun "
>
> cd $WORKDIR0
> #$DO_PARALLEL
> $EXE -s 4q.tpr -deffnm 4q -dlb yes -resethway -multidir 1 2 3 4
> And I noticed that the performance went from 37,32,23,22 ns/day to ~42
> ns/day in all four simulations. I check that the 80 processors were been
> used a 100% of the time, while the gpu was used about a 50% (from a 70%
> when running a single simulation in the node where I obtain a performance
> of ~50 ns/day).

Great!

Note that optimizing hardware utilization doesn't always maximize performance.

Also, manual launches with pinoffset/pinstride will give exactly the
same performance as the multi runs *if* you get the affinities right.
In your original commands you tried to use 20 of the 80 threads/rank,
but you offset the runs only by 10 (hardware threads) which means that
runs  were overlapping and interfering with each other as well as
ending up under-utilizing the hardware.

> So overall I'm quite happy with the performance I'm getting now; and
> honestly, I don't know if at some point I can get the same performance
> (running 4 jobs) that I'm getting running just one.

No, but you _may_ get a bit more aggregate performance if you run 8
concurrent jobs. Also, you cna try 1 thread per core ("mpirun -np 4
gmx mdrun_mpi -multi 4 -ntomp 10 -pin on to use only half of the
threads),

Cheers,
--
Szilárd

> Best regards,
> Carlos
>
> ——
> Carlos Navarro Retamal
> Bioinformatic Engineering. PhD.
> Postdoctoral Researcher in Center of Bioinformatics and Molecular
> Simulations
> Universidad de Talca
> Av. Lircay S/N, Talca, Chile
> E: carlos.navarr...@gmail.com or cnava...@utalca.cl
>
> On July 29, 2019 at 6:11:31 PM, Mark Abraham (mark.j.abra...@gmail.com)
> wrote:
>
> Hi,
>
> Yes and the -nmpi I copied from Carlos's post is ineffective - use -ntmpi
>
> Mark
>
>
> On Mon., 29 Jul. 2019, 15:15 Justin Lemkul,  wrote:
>
> >
> >
> > On 7/29/19 8:46 AM, Carlos Navarro wrote:
> > > Hi Mark,
> > > I tried that before, but unfortunately in that case (removing
> —gres=gpu:1
> > > and including in each line the -gpu_id flag) for some reason the jobs
> are
> > > run one at a time (one after the other), so I can’t use properly the
> > whole
> > > node.
> > >
> >
> > You need to run all but the last mdrun process in the background (&).
> >
> > -Justin
> >
> > > ——
> > > Carlos Navarro Retamal
> > > Bioinformatic Engineering. PhD.
> > > Postdoctoral Researcher in Center of Bioinformatics and Molecular
> > > Simulations
> > > Universidad de Talca
> > > Av. Lircay S/N, Talca, Chile
> > > E: carlos.navarr...@gmail.com or cnava...@utalca.cl
> > >
> > > On July 29, 2019 at 11:48:21 AM, Mark Abraham (mark.j.abra...@gmail.com)
> > > wrote:
> > >
> > > Hi,
> > >
> > > When you use
> > >
> > > DO_PARALLEL=" srun --exclusive -n 1 --gres=gpu:1 "
> > >
> > > then the environment seems to make sure only one GPU is visible. (The
> log
> > > files report only finding one GPU.) But it's probably the same GPU in
> > each
> > > case, with three remaining idle. I would suggest not using --gres unless
> > > you can specify *which* of the four available GPUs each run can use.
> > >
> > > Otherwise, don't use --gres and use the facilities built into GROMACS,
> > e.g.
> > >
> > > $DO_PARALLEL $EXE -s eq6.tpr -deffnm eq6-20 -nmpi 1 -pin on -pinoffset 0
> > > -ntomp 20 -gpu_id 0
> > > $DO_PARALLEL $EXE -s eq6.tpr -deffnm eq6-20 -nmpi 1 -pin on -pinoffset
> 10
> > > -ntomp 20 -gpu_id 1
> > > $DO_PARALLEL $EXE -s eq6.tpr -deffnm eq6-20 -nmpi 1 -pin on -pinoffset
> 20
> > > -ntomp 20 -gpu_id 2
> > > etc.
> > >
> > > Mark
> > >
> > > On Mon, 29 Jul 2019 at 11:34, Carlos Navarro  > >
> > > wrote:
> > >
> > >> Hi Szilárd,
> > >> To answer your questions:
> > >> **are you trying to run multiple simulations concurrently on the same
> > >> node or are you trying to strong-scale?
> > >> I'm trying to run multiple simulations on the same node at the same
> > time.
> > >>
> > >> ** what are you simulating?
> > >> Regular and CompEl simulations
> > >>
> > >> ** can you provide log files of the runs?
> > >> In the following link are some logs files:
> > >> https://www.dropbox.com/s/7q249vbqqwf5r03/Archive.zip?dl=0.
> > >> In short, alone.log -> single run in the node (using 1 gpu).
> > >> multi1/2/3/4.log ->4 independent simulations ran at the same time in a
> > >> single node. In all cases, 20 cpus are used.
> > >> Best regards,
> > >> Carlos
> > >>
> > >> El jue., 25 jul. 2019 a las 10:59, Szilárd Páll (<
> > pall.szil...@gmail.com>)
> > >> escribió:
> > >>
> > >>> Hi,
> > >>>
> > >>> It is not clear to me how are you trying to set up your runs, so
> > >>> please provide some details:
> > >>> - are you trying to run multiple simulations concurrently on the same
> > >>> node or are you trying to st

Re: [gmx-users] performance issues running gromacs with more than 1 gpu card in slurm

2019-07-30 Thread Carlos Navarro
Hi all,
First of all, thanks for all your valuable inputs!!.
I tried Szilárd suggestion (multi simulations) with the following commands
(using a single node):

EXE="mpirun -np 4 gmx_mpi mdrun "

cd $WORKDIR0
#$DO_PARALLEL
$EXE -s 4q.tpr -deffnm 4q -dlb yes -resethway -multidir 1 2 3 4
And I noticed that the performance went from 37,32,23,22 ns/day to ~42
ns/day in all four simulations. I check that the 80 processors were been
used a 100% of the time, while the gpu was used about a 50% (from a 70%
when running a single simulation in the node where I obtain a performance
of ~50 ns/day).
So overall I'm quite happy with the performance I'm getting now; and
honestly, I don't know if at some point I can get the same performance
(running 4 jobs) that I'm getting running just one.
Best regards,
Carlos

——
Carlos Navarro Retamal
Bioinformatic Engineering. PhD.
Postdoctoral Researcher in Center of Bioinformatics and Molecular
Simulations
Universidad de Talca
Av. Lircay S/N, Talca, Chile
E: carlos.navarr...@gmail.com or cnava...@utalca.cl

On July 29, 2019 at 6:11:31 PM, Mark Abraham (mark.j.abra...@gmail.com)
wrote:

Hi,

Yes and the -nmpi I copied from Carlos's post is ineffective - use -ntmpi

Mark


On Mon., 29 Jul. 2019, 15:15 Justin Lemkul,  wrote:

>
>
> On 7/29/19 8:46 AM, Carlos Navarro wrote:
> > Hi Mark,
> > I tried that before, but unfortunately in that case (removing
—gres=gpu:1
> > and including in each line the -gpu_id flag) for some reason the jobs
are
> > run one at a time (one after the other), so I can’t use properly the
> whole
> > node.
> >
>
> You need to run all but the last mdrun process in the background (&).
>
> -Justin
>
> > ——
> > Carlos Navarro Retamal
> > Bioinformatic Engineering. PhD.
> > Postdoctoral Researcher in Center of Bioinformatics and Molecular
> > Simulations
> > Universidad de Talca
> > Av. Lircay S/N, Talca, Chile
> > E: carlos.navarr...@gmail.com or cnava...@utalca.cl
> >
> > On July 29, 2019 at 11:48:21 AM, Mark Abraham (mark.j.abra...@gmail.com)
> > wrote:
> >
> > Hi,
> >
> > When you use
> >
> > DO_PARALLEL=" srun --exclusive -n 1 --gres=gpu:1 "
> >
> > then the environment seems to make sure only one GPU is visible. (The
log
> > files report only finding one GPU.) But it's probably the same GPU in
> each
> > case, with three remaining idle. I would suggest not using --gres unless
> > you can specify *which* of the four available GPUs each run can use.
> >
> > Otherwise, don't use --gres and use the facilities built into GROMACS,
> e.g.
> >
> > $DO_PARALLEL $EXE -s eq6.tpr -deffnm eq6-20 -nmpi 1 -pin on -pinoffset 0
> > -ntomp 20 -gpu_id 0
> > $DO_PARALLEL $EXE -s eq6.tpr -deffnm eq6-20 -nmpi 1 -pin on -pinoffset
10
> > -ntomp 20 -gpu_id 1
> > $DO_PARALLEL $EXE -s eq6.tpr -deffnm eq6-20 -nmpi 1 -pin on -pinoffset
20
> > -ntomp 20 -gpu_id 2
> > etc.
> >
> > Mark
> >
> > On Mon, 29 Jul 2019 at 11:34, Carlos Navarro  >
> > wrote:
> >
> >> Hi Szilárd,
> >> To answer your questions:
> >> **are you trying to run multiple simulations concurrently on the same
> >> node or are you trying to strong-scale?
> >> I'm trying to run multiple simulations on the same node at the same
> time.
> >>
> >> ** what are you simulating?
> >> Regular and CompEl simulations
> >>
> >> ** can you provide log files of the runs?
> >> In the following link are some logs files:
> >> https://www.dropbox.com/s/7q249vbqqwf5r03/Archive.zip?dl=0.
> >> In short, alone.log -> single run in the node (using 1 gpu).
> >> multi1/2/3/4.log ->4 independent simulations ran at the same time in a
> >> single node. In all cases, 20 cpus are used.
> >> Best regards,
> >> Carlos
> >>
> >> El jue., 25 jul. 2019 a las 10:59, Szilárd Páll (<
> pall.szil...@gmail.com>)
> >> escribió:
> >>
> >>> Hi,
> >>>
> >>> It is not clear to me how are you trying to set up your runs, so
> >>> please provide some details:
> >>> - are you trying to run multiple simulations concurrently on the same
> >>> node or are you trying to strong-scale?
> >>> - what are you simulating?
> >>> - can you provide log files of the runs?
> >>>
> >>> Cheers,
> >>>
> >>> --
> >>> Szilárd
> >>>
> >>> On Tue, Jul 23, 2019 at 1:34 AM Carlos Navarro
> >>>  wrote:
>  No one can give me an idea of what can be happening? Or how I can
> > solve
> >>> it?
>  Best regards,
>  Carlos
> 
>  ——
>  Carlos Navarro Retamal
>  Bioinformatic Engineering. PhD.
>  Postdoctoral Researcher in Center of Bioinformatics and Molecular
>  Simulations
>  Universidad de Talca
>  Av. Lircay S/N, Talca, Chile
>  E: carlos.navarr...@gmail.com or cnava...@utalca.cl
> 
>  On July 19, 2019 at 2:20:41 PM, Carlos Navarro (
> >>> carlos.navarr...@gmail.com)
>  wrote:
> 
>  Dear gmx-users,
>  I’m currently working in a server where each node posses 40 physical
> >>> cores
>  (40 threads) and 4 Nvidia-V100.
>  When I launch a single job (1 simulation using a single gpu card) I
> >> get a
>  performance 

Re: [gmx-users] performance issues running gromacs with more than 1 gpu card in slurm

2019-07-29 Thread Mark Abraham
Hi,

Yes and the -nmpi I copied from Carlos's post is ineffective - use -ntmpi

Mark


On Mon., 29 Jul. 2019, 15:15 Justin Lemkul,  wrote:

>
>
> On 7/29/19 8:46 AM, Carlos Navarro wrote:
> > Hi Mark,
> > I tried that before, but unfortunately in that case (removing —gres=gpu:1
> > and including in each line the -gpu_id flag) for some reason the jobs are
> > run one at a time (one after the other), so I can’t use properly the
> whole
> > node.
> >
>
> You need to run all but the last mdrun process in the background (&).
>
> -Justin
>
> > ——
> > Carlos Navarro Retamal
> > Bioinformatic Engineering. PhD.
> > Postdoctoral Researcher in Center of Bioinformatics and Molecular
> > Simulations
> > Universidad de Talca
> > Av. Lircay S/N, Talca, Chile
> > E: carlos.navarr...@gmail.com or cnava...@utalca.cl
> >
> > On July 29, 2019 at 11:48:21 AM, Mark Abraham (mark.j.abra...@gmail.com)
> > wrote:
> >
> > Hi,
> >
> > When you use
> >
> > DO_PARALLEL=" srun --exclusive -n 1 --gres=gpu:1 "
> >
> > then the environment seems to make sure only one GPU is visible. (The log
> > files report only finding one GPU.) But it's probably the same GPU in
> each
> > case, with three remaining idle. I would suggest not using --gres unless
> > you can specify *which* of the four available GPUs each run can use.
> >
> > Otherwise, don't use --gres and use the facilities built into GROMACS,
> e.g.
> >
> > $DO_PARALLEL $EXE -s eq6.tpr -deffnm eq6-20 -nmpi 1 -pin on -pinoffset 0
> > -ntomp 20 -gpu_id 0
> > $DO_PARALLEL $EXE -s eq6.tpr -deffnm eq6-20 -nmpi 1 -pin on -pinoffset 10
> > -ntomp 20 -gpu_id 1
> > $DO_PARALLEL $EXE -s eq6.tpr -deffnm eq6-20 -nmpi 1 -pin on -pinoffset 20
> > -ntomp 20 -gpu_id 2
> > etc.
> >
> > Mark
> >
> > On Mon, 29 Jul 2019 at 11:34, Carlos Navarro  >
> > wrote:
> >
> >> Hi Szilárd,
> >> To answer your questions:
> >> **are you trying to run multiple simulations concurrently on the same
> >> node or are you trying to strong-scale?
> >> I'm trying to run multiple simulations on the same node at the same
> time.
> >>
> >> ** what are you simulating?
> >> Regular and CompEl simulations
> >>
> >> ** can you provide log files of the runs?
> >> In the following link are some logs files:
> >> https://www.dropbox.com/s/7q249vbqqwf5r03/Archive.zip?dl=0.
> >> In short, alone.log -> single run in the node (using 1 gpu).
> >> multi1/2/3/4.log ->4 independent simulations ran at the same time in a
> >> single node. In all cases, 20 cpus are used.
> >> Best regards,
> >> Carlos
> >>
> >> El jue., 25 jul. 2019 a las 10:59, Szilárd Páll (<
> pall.szil...@gmail.com>)
> >> escribió:
> >>
> >>> Hi,
> >>>
> >>> It is not clear to me how are you trying to set up your runs, so
> >>> please provide some details:
> >>> - are you trying to run multiple simulations concurrently on the same
> >>> node or are you trying to strong-scale?
> >>> - what are you simulating?
> >>> - can you provide log files of the runs?
> >>>
> >>> Cheers,
> >>>
> >>> --
> >>> Szilárd
> >>>
> >>> On Tue, Jul 23, 2019 at 1:34 AM Carlos Navarro
> >>>  wrote:
>  No one can give me an idea of what can be happening? Or how I can
> > solve
> >>> it?
>  Best regards,
>  Carlos
> 
>  ——
>  Carlos Navarro Retamal
>  Bioinformatic Engineering. PhD.
>  Postdoctoral Researcher in Center of Bioinformatics and Molecular
>  Simulations
>  Universidad de Talca
>  Av. Lircay S/N, Talca, Chile
>  E: carlos.navarr...@gmail.com or cnava...@utalca.cl
> 
>  On July 19, 2019 at 2:20:41 PM, Carlos Navarro (
> >>> carlos.navarr...@gmail.com)
>  wrote:
> 
>  Dear gmx-users,
>  I’m currently working in a server where each node posses 40 physical
> >>> cores
>  (40 threads) and 4 Nvidia-V100.
>  When I launch a single job (1 simulation using a single gpu card) I
> >> get a
>  performance of about ~35ns/day in a system of about 300k atoms.
> > Looking
>  into the usage of the video card during the simulation I notice that
> >> the
>  card is being used about and ~80%.
>  The problems arise when I increase the number of jobs running at the
> >> same
>  time. If for instance 2 jobs are running at the same time, the
> >>> performance
>  drops to ~25ns/day each and the usage of the video cards also drops
> >>> during
>  the simulation to about a ~30-40% (and sometimes dropping to less than
> >>> 5%).
>  Clearly there is a communication problem between the gpu cards and the
> >>> cpu
>  during the simulations, but I don’t know how to solve this.
>  Here is the script I use to run the simulations:
> 
>  #!/bin/bash -x
>  #SBATCH --job-name=testAtTPC1
>  #SBATCH --ntasks-per-node=4
>  #SBATCH --cpus-per-task=20
>  #SBATCH --account=hdd22
>  #SBATCH --nodes=1
>  #SBATCH --mem=0
>  #SBATCH --output=sout.%j
>  #SBATCH --error=s4err.%j
>  #SBATCH --time=00:10:00
>  #SBATCH --partition=develgpus
>  #SBATCH --gres=gpu:4
> 

Re: [gmx-users] performance issues running gromacs with more than 1 gpu card in slurm

2019-07-29 Thread Szilárd Páll
Carlos,

You can accomplish the same using the multi-simulation feature of
mdrun and avoid having to manually manage the placement of runs, e.g.
instead of the above you just write
gmx mdrun_mpi -np N -multidir $WORKDIR1 $WORKDIR2 $WORKDIR3 ...
For more details see
http://manual.gromacs.org/documentation/current/user-guide/mdrun-features.html#running-multi-simulations
Note that if the different runs have different speed, just as with
your manual launch, your machine can end up partially utilized when
some of the runs finish.

Cheers,
--
Szilárd

On Mon, Jul 29, 2019 at 2:46 PM Carlos Navarro
 wrote:
>
> Hi Mark,
> I tried that before, but unfortunately in that case (removing —gres=gpu:1
> and including in each line the -gpu_id flag) for some reason the jobs are
> run one at a time (one after the other), so I can’t use properly the whole
> node.
>
>
> ——
> Carlos Navarro Retamal
> Bioinformatic Engineering. PhD.
> Postdoctoral Researcher in Center of Bioinformatics and Molecular
> Simulations
> Universidad de Talca
> Av. Lircay S/N, Talca, Chile
> E: carlos.navarr...@gmail.com or cnava...@utalca.cl
>
> On July 29, 2019 at 11:48:21 AM, Mark Abraham (mark.j.abra...@gmail.com)
> wrote:
>
> Hi,
>
> When you use
>
> DO_PARALLEL=" srun --exclusive -n 1 --gres=gpu:1 "
>
> then the environment seems to make sure only one GPU is visible. (The log
> files report only finding one GPU.) But it's probably the same GPU in each
> case, with three remaining idle. I would suggest not using --gres unless
> you can specify *which* of the four available GPUs each run can use.
>
> Otherwise, don't use --gres and use the facilities built into GROMACS, e.g.
>
> $DO_PARALLEL $EXE -s eq6.tpr -deffnm eq6-20 -nmpi 1 -pin on -pinoffset 0
> -ntomp 20 -gpu_id 0
> $DO_PARALLEL $EXE -s eq6.tpr -deffnm eq6-20 -nmpi 1 -pin on -pinoffset 10
> -ntomp 20 -gpu_id 1
> $DO_PARALLEL $EXE -s eq6.tpr -deffnm eq6-20 -nmpi 1 -pin on -pinoffset 20
> -ntomp 20 -gpu_id 2
> etc.
>
> Mark
>
> On Mon, 29 Jul 2019 at 11:34, Carlos Navarro 
> wrote:
>
> > Hi Szilárd,
> > To answer your questions:
> > **are you trying to run multiple simulations concurrently on the same
> > node or are you trying to strong-scale?
> > I'm trying to run multiple simulations on the same node at the same time.
> >
> > ** what are you simulating?
> > Regular and CompEl simulations
> >
> > ** can you provide log files of the runs?
> > In the following link are some logs files:
> > https://www.dropbox.com/s/7q249vbqqwf5r03/Archive.zip?dl=0.
> > In short, alone.log -> single run in the node (using 1 gpu).
> > multi1/2/3/4.log ->4 independent simulations ran at the same time in a
> > single node. In all cases, 20 cpus are used.
> > Best regards,
> > Carlos
> >
> > El jue., 25 jul. 2019 a las 10:59, Szilárd Páll ()
> > escribió:
> >
> > > Hi,
> > >
> > > It is not clear to me how are you trying to set up your runs, so
> > > please provide some details:
> > > - are you trying to run multiple simulations concurrently on the same
> > > node or are you trying to strong-scale?
> > > - what are you simulating?
> > > - can you provide log files of the runs?
> > >
> > > Cheers,
> > >
> > > --
> > > Szilárd
> > >
> > > On Tue, Jul 23, 2019 at 1:34 AM Carlos Navarro
> > >  wrote:
> > > >
> > > > No one can give me an idea of what can be happening? Or how I can
> solve
> > > it?
> > > > Best regards,
> > > > Carlos
> > > >
> > > > ——
> > > > Carlos Navarro Retamal
> > > > Bioinformatic Engineering. PhD.
> > > > Postdoctoral Researcher in Center of Bioinformatics and Molecular
> > > > Simulations
> > > > Universidad de Talca
> > > > Av. Lircay S/N, Talca, Chile
> > > > E: carlos.navarr...@gmail.com or cnava...@utalca.cl
> > > >
> > > > On July 19, 2019 at 2:20:41 PM, Carlos Navarro (
> > > carlos.navarr...@gmail.com)
> > > > wrote:
> > > >
> > > > Dear gmx-users,
> > > > I’m currently working in a server where each node posses 40 physical
> > > cores
> > > > (40 threads) and 4 Nvidia-V100.
> > > > When I launch a single job (1 simulation using a single gpu card) I
> > get a
> > > > performance of about ~35ns/day in a system of about 300k atoms.
> Looking
> > > > into the usage of the video card during the simulation I notice that
> > the
> > > > card is being used about and ~80%.
> > > > The problems arise when I increase the number of jobs running at the
> > same
> > > > time. If for instance 2 jobs are running at the same time, the
> > > performance
> > > > drops to ~25ns/day each and the usage of the video cards also drops
> > > during
> > > > the simulation to about a ~30-40% (and sometimes dropping to less than
> > > 5%).
> > > > Clearly there is a communication problem between the gpu cards and the
> > > cpu
> > > > during the simulations, but I don’t know how to solve this.
> > > > Here is the script I use to run the simulations:
> > > >
> > > > #!/bin/bash -x
> > > > #SBATCH --job-name=testAtTPC1
> > > > #SBATCH --ntasks-per-node=4
> > > > #SBATCH --cpus-per-task=20
> > > > #SBATCH 

Re: [gmx-users] performance issues running gromacs with more than 1 gpu card in slurm

2019-07-29 Thread Justin Lemkul



On 7/29/19 8:46 AM, Carlos Navarro wrote:

Hi Mark,
I tried that before, but unfortunately in that case (removing —gres=gpu:1
and including in each line the -gpu_id flag) for some reason the jobs are
run one at a time (one after the other), so I can’t use properly the whole
node.



You need to run all but the last mdrun process in the background (&).

-Justin


——
Carlos Navarro Retamal
Bioinformatic Engineering. PhD.
Postdoctoral Researcher in Center of Bioinformatics and Molecular
Simulations
Universidad de Talca
Av. Lircay S/N, Talca, Chile
E: carlos.navarr...@gmail.com or cnava...@utalca.cl

On July 29, 2019 at 11:48:21 AM, Mark Abraham (mark.j.abra...@gmail.com)
wrote:

Hi,

When you use

DO_PARALLEL=" srun --exclusive -n 1 --gres=gpu:1 "

then the environment seems to make sure only one GPU is visible. (The log
files report only finding one GPU.) But it's probably the same GPU in each
case, with three remaining idle. I would suggest not using --gres unless
you can specify *which* of the four available GPUs each run can use.

Otherwise, don't use --gres and use the facilities built into GROMACS, e.g.

$DO_PARALLEL $EXE -s eq6.tpr -deffnm eq6-20 -nmpi 1 -pin on -pinoffset 0
-ntomp 20 -gpu_id 0
$DO_PARALLEL $EXE -s eq6.tpr -deffnm eq6-20 -nmpi 1 -pin on -pinoffset 10
-ntomp 20 -gpu_id 1
$DO_PARALLEL $EXE -s eq6.tpr -deffnm eq6-20 -nmpi 1 -pin on -pinoffset 20
-ntomp 20 -gpu_id 2
etc.

Mark

On Mon, 29 Jul 2019 at 11:34, Carlos Navarro 
wrote:


Hi Szilárd,
To answer your questions:
**are you trying to run multiple simulations concurrently on the same
node or are you trying to strong-scale?
I'm trying to run multiple simulations on the same node at the same time.

** what are you simulating?
Regular and CompEl simulations

** can you provide log files of the runs?
In the following link are some logs files:
https://www.dropbox.com/s/7q249vbqqwf5r03/Archive.zip?dl=0.
In short, alone.log -> single run in the node (using 1 gpu).
multi1/2/3/4.log ->4 independent simulations ran at the same time in a
single node. In all cases, 20 cpus are used.
Best regards,
Carlos

El jue., 25 jul. 2019 a las 10:59, Szilárd Páll ()
escribió:


Hi,

It is not clear to me how are you trying to set up your runs, so
please provide some details:
- are you trying to run multiple simulations concurrently on the same
node or are you trying to strong-scale?
- what are you simulating?
- can you provide log files of the runs?

Cheers,

--
Szilárd

On Tue, Jul 23, 2019 at 1:34 AM Carlos Navarro
 wrote:

No one can give me an idea of what can be happening? Or how I can

solve

it?

Best regards,
Carlos

——
Carlos Navarro Retamal
Bioinformatic Engineering. PhD.
Postdoctoral Researcher in Center of Bioinformatics and Molecular
Simulations
Universidad de Talca
Av. Lircay S/N, Talca, Chile
E: carlos.navarr...@gmail.com or cnava...@utalca.cl

On July 19, 2019 at 2:20:41 PM, Carlos Navarro (

carlos.navarr...@gmail.com)

wrote:

Dear gmx-users,
I’m currently working in a server where each node posses 40 physical

cores

(40 threads) and 4 Nvidia-V100.
When I launch a single job (1 simulation using a single gpu card) I

get a

performance of about ~35ns/day in a system of about 300k atoms.

Looking

into the usage of the video card during the simulation I notice that

the

card is being used about and ~80%.
The problems arise when I increase the number of jobs running at the

same

time. If for instance 2 jobs are running at the same time, the

performance

drops to ~25ns/day each and the usage of the video cards also drops

during

the simulation to about a ~30-40% (and sometimes dropping to less than

5%).

Clearly there is a communication problem between the gpu cards and the

cpu

during the simulations, but I don’t know how to solve this.
Here is the script I use to run the simulations:

#!/bin/bash -x
#SBATCH --job-name=testAtTPC1
#SBATCH --ntasks-per-node=4
#SBATCH --cpus-per-task=20
#SBATCH --account=hdd22
#SBATCH --nodes=1
#SBATCH --mem=0
#SBATCH --output=sout.%j
#SBATCH --error=s4err.%j
#SBATCH --time=00:10:00
#SBATCH --partition=develgpus
#SBATCH --gres=gpu:4

module use /gpfs/software/juwels/otherstages
module load Stages/2018b
module load Intel/2019.0.117-GCC-7.3.0
module load IntelMPI/2019.0.117
module load GROMACS/2018.3

WORKDIR1=/p/project/chdd22/gromacs/benchmark/AtTPC1/singlegpu/1
WORKDIR2=/p/project/chdd22/gromacs/benchmark/AtTPC1/singlegpu/2
WORKDIR3=/p/project/chdd22/gromacs/benchmark/AtTPC1/singlegpu/3
WORKDIR4=/p/project/chdd22/gromacs/benchmark/AtTPC1/singlegpu/4

DO_PARALLEL=" srun --exclusive -n 1 --gres=gpu:1 "
EXE=" gmx mdrun "

cd $WORKDIR1
$DO_PARALLEL $EXE -s eq6.tpr -deffnm eq6-20 -nmpi 1 -pin on -pinoffset

0

-ntomp 20 &>log &
cd $WORKDIR2
$DO_PARALLEL $EXE -s eq6.tpr -deffnm eq6-20 -nmpi 1 -pin on -pinoffset

10

-ntomp 20 &>log &
cd $WORKDIR3
$DO_PARALLEL $EXE -s eq6.tpr -deffnm eq6-20 -nmpi 1 -pin on -pinoffset

20

-ntomp 20 &>log &
cd $WORKDIR4
$DO_PARALLEL $EXE -s eq6.tpr -deffnm eq6-20 -nmpi 

Re: [gmx-users] performance issues running gromacs with more than 1 gpu card in slurm

2019-07-29 Thread Carlos Navarro
Hi Mark,
I tried that before, but unfortunately in that case (removing —gres=gpu:1
and including in each line the -gpu_id flag) for some reason the jobs are
run one at a time (one after the other), so I can’t use properly the whole
node.


——
Carlos Navarro Retamal
Bioinformatic Engineering. PhD.
Postdoctoral Researcher in Center of Bioinformatics and Molecular
Simulations
Universidad de Talca
Av. Lircay S/N, Talca, Chile
E: carlos.navarr...@gmail.com or cnava...@utalca.cl

On July 29, 2019 at 11:48:21 AM, Mark Abraham (mark.j.abra...@gmail.com)
wrote:

Hi,

When you use

DO_PARALLEL=" srun --exclusive -n 1 --gres=gpu:1 "

then the environment seems to make sure only one GPU is visible. (The log
files report only finding one GPU.) But it's probably the same GPU in each
case, with three remaining idle. I would suggest not using --gres unless
you can specify *which* of the four available GPUs each run can use.

Otherwise, don't use --gres and use the facilities built into GROMACS, e.g.

$DO_PARALLEL $EXE -s eq6.tpr -deffnm eq6-20 -nmpi 1 -pin on -pinoffset 0
-ntomp 20 -gpu_id 0
$DO_PARALLEL $EXE -s eq6.tpr -deffnm eq6-20 -nmpi 1 -pin on -pinoffset 10
-ntomp 20 -gpu_id 1
$DO_PARALLEL $EXE -s eq6.tpr -deffnm eq6-20 -nmpi 1 -pin on -pinoffset 20
-ntomp 20 -gpu_id 2
etc.

Mark

On Mon, 29 Jul 2019 at 11:34, Carlos Navarro 
wrote:

> Hi Szilárd,
> To answer your questions:
> **are you trying to run multiple simulations concurrently on the same
> node or are you trying to strong-scale?
> I'm trying to run multiple simulations on the same node at the same time.
>
> ** what are you simulating?
> Regular and CompEl simulations
>
> ** can you provide log files of the runs?
> In the following link are some logs files:
> https://www.dropbox.com/s/7q249vbqqwf5r03/Archive.zip?dl=0.
> In short, alone.log -> single run in the node (using 1 gpu).
> multi1/2/3/4.log ->4 independent simulations ran at the same time in a
> single node. In all cases, 20 cpus are used.
> Best regards,
> Carlos
>
> El jue., 25 jul. 2019 a las 10:59, Szilárd Páll ()
> escribió:
>
> > Hi,
> >
> > It is not clear to me how are you trying to set up your runs, so
> > please provide some details:
> > - are you trying to run multiple simulations concurrently on the same
> > node or are you trying to strong-scale?
> > - what are you simulating?
> > - can you provide log files of the runs?
> >
> > Cheers,
> >
> > --
> > Szilárd
> >
> > On Tue, Jul 23, 2019 at 1:34 AM Carlos Navarro
> >  wrote:
> > >
> > > No one can give me an idea of what can be happening? Or how I can
solve
> > it?
> > > Best regards,
> > > Carlos
> > >
> > > ——
> > > Carlos Navarro Retamal
> > > Bioinformatic Engineering. PhD.
> > > Postdoctoral Researcher in Center of Bioinformatics and Molecular
> > > Simulations
> > > Universidad de Talca
> > > Av. Lircay S/N, Talca, Chile
> > > E: carlos.navarr...@gmail.com or cnava...@utalca.cl
> > >
> > > On July 19, 2019 at 2:20:41 PM, Carlos Navarro (
> > carlos.navarr...@gmail.com)
> > > wrote:
> > >
> > > Dear gmx-users,
> > > I’m currently working in a server where each node posses 40 physical
> > cores
> > > (40 threads) and 4 Nvidia-V100.
> > > When I launch a single job (1 simulation using a single gpu card) I
> get a
> > > performance of about ~35ns/day in a system of about 300k atoms.
Looking
> > > into the usage of the video card during the simulation I notice that
> the
> > > card is being used about and ~80%.
> > > The problems arise when I increase the number of jobs running at the
> same
> > > time. If for instance 2 jobs are running at the same time, the
> > performance
> > > drops to ~25ns/day each and the usage of the video cards also drops
> > during
> > > the simulation to about a ~30-40% (and sometimes dropping to less than
> > 5%).
> > > Clearly there is a communication problem between the gpu cards and the
> > cpu
> > > during the simulations, but I don’t know how to solve this.
> > > Here is the script I use to run the simulations:
> > >
> > > #!/bin/bash -x
> > > #SBATCH --job-name=testAtTPC1
> > > #SBATCH --ntasks-per-node=4
> > > #SBATCH --cpus-per-task=20
> > > #SBATCH --account=hdd22
> > > #SBATCH --nodes=1
> > > #SBATCH --mem=0
> > > #SBATCH --output=sout.%j
> > > #SBATCH --error=s4err.%j
> > > #SBATCH --time=00:10:00
> > > #SBATCH --partition=develgpus
> > > #SBATCH --gres=gpu:4
> > >
> > > module use /gpfs/software/juwels/otherstages
> > > module load Stages/2018b
> > > module load Intel/2019.0.117-GCC-7.3.0
> > > module load IntelMPI/2019.0.117
> > > module load GROMACS/2018.3
> > >
> > > WORKDIR1=/p/project/chdd22/gromacs/benchmark/AtTPC1/singlegpu/1
> > > WORKDIR2=/p/project/chdd22/gromacs/benchmark/AtTPC1/singlegpu/2
> > > WORKDIR3=/p/project/chdd22/gromacs/benchmark/AtTPC1/singlegpu/3
> > > WORKDIR4=/p/project/chdd22/gromacs/benchmark/AtTPC1/singlegpu/4
> > >
> > > DO_PARALLEL=" srun --exclusive -n 1 --gres=gpu:1 "
> > > EXE=" gmx mdrun "
> > >
> > > cd $WORKDIR1
> > > $DO_PARALLEL $EXE -s eq6.tpr -deff

Re: [gmx-users] performance issues running gromacs with more than 1 gpu card in slurm

2019-07-29 Thread Mark Abraham
Hi,

When you use

DO_PARALLEL=" srun --exclusive -n 1 --gres=gpu:1 "

then the environment seems to make sure only one GPU is visible. (The log
files report only finding one GPU.) But it's probably the same GPU in each
case, with three remaining idle. I would suggest not using --gres unless
you can specify *which* of the four available GPUs each run can use.

Otherwise, don't use --gres and use the facilities built into GROMACS, e.g.

$DO_PARALLEL $EXE -s eq6.tpr -deffnm eq6-20 -nmpi 1 -pin on -pinoffset 0
-ntomp 20 -gpu_id 0
$DO_PARALLEL $EXE -s eq6.tpr -deffnm eq6-20 -nmpi 1 -pin on -pinoffset 10
-ntomp 20 -gpu_id 1
$DO_PARALLEL $EXE -s eq6.tpr -deffnm eq6-20  -nmpi 1 -pin on -pinoffset 20
-ntomp 20 -gpu_id 2
etc.

Mark

On Mon, 29 Jul 2019 at 11:34, Carlos Navarro 
wrote:

> Hi Szilárd,
> To answer your questions:
> **are you trying to run multiple simulations concurrently on the same
> node or are you trying to strong-scale?
> I'm trying to run multiple simulations on the same node at the same time.
>
> ** what are you simulating?
> Regular and CompEl simulations
>
> ** can you provide log files of the runs?
> In the following link are some logs files:
> https://www.dropbox.com/s/7q249vbqqwf5r03/Archive.zip?dl=0.
> In short, alone.log -> single run in the node (using 1 gpu).
> multi1/2/3/4.log ->4 independent simulations ran at the same time in a
> single node. In all cases, 20 cpus are used.
> Best regards,
> Carlos
>
> El jue., 25 jul. 2019 a las 10:59, Szilárd Páll ()
> escribió:
>
> > Hi,
> >
> > It is not clear to me how are you trying to set up your runs, so
> > please provide some details:
> > - are you trying to run multiple simulations concurrently on the same
> > node or are you trying to strong-scale?
> > - what are you simulating?
> > - can you provide log files of the runs?
> >
> > Cheers,
> >
> > --
> > Szilárd
> >
> > On Tue, Jul 23, 2019 at 1:34 AM Carlos Navarro
> >  wrote:
> > >
> > > No one can give me an idea of what can be happening? Or how I can solve
> > it?
> > > Best regards,
> > > Carlos
> > >
> > > ——
> > > Carlos Navarro Retamal
> > > Bioinformatic Engineering. PhD.
> > > Postdoctoral Researcher in Center of Bioinformatics and Molecular
> > > Simulations
> > > Universidad de Talca
> > > Av. Lircay S/N, Talca, Chile
> > > E: carlos.navarr...@gmail.com or cnava...@utalca.cl
> > >
> > > On July 19, 2019 at 2:20:41 PM, Carlos Navarro (
> > carlos.navarr...@gmail.com)
> > > wrote:
> > >
> > > Dear gmx-users,
> > > I’m currently working in a server where each node posses 40 physical
> > cores
> > > (40 threads) and 4 Nvidia-V100.
> > > When I launch a single job (1 simulation using a single gpu card) I
> get a
> > > performance of about ~35ns/day in a system of about 300k atoms. Looking
> > > into the usage of the video card during the simulation I notice that
> the
> > > card is being used about and ~80%.
> > > The problems arise when I increase the number of jobs running at the
> same
> > > time. If for instance 2 jobs are running at the same time, the
> > performance
> > > drops to ~25ns/day each and the usage of the video cards also drops
> > during
> > > the simulation to about a ~30-40% (and sometimes dropping to less than
> > 5%).
> > > Clearly there is a communication problem between the gpu cards and the
> > cpu
> > > during the simulations, but I don’t know how to solve this.
> > > Here is the script I use to run the simulations:
> > >
> > > #!/bin/bash -x
> > > #SBATCH --job-name=testAtTPC1
> > > #SBATCH --ntasks-per-node=4
> > > #SBATCH --cpus-per-task=20
> > > #SBATCH --account=hdd22
> > > #SBATCH --nodes=1
> > > #SBATCH --mem=0
> > > #SBATCH --output=sout.%j
> > > #SBATCH --error=s4err.%j
> > > #SBATCH --time=00:10:00
> > > #SBATCH --partition=develgpus
> > > #SBATCH --gres=gpu:4
> > >
> > > module use /gpfs/software/juwels/otherstages
> > > module load Stages/2018b
> > > module load Intel/2019.0.117-GCC-7.3.0
> > > module load IntelMPI/2019.0.117
> > > module load GROMACS/2018.3
> > >
> > > WORKDIR1=/p/project/chdd22/gromacs/benchmark/AtTPC1/singlegpu/1
> > > WORKDIR2=/p/project/chdd22/gromacs/benchmark/AtTPC1/singlegpu/2
> > > WORKDIR3=/p/project/chdd22/gromacs/benchmark/AtTPC1/singlegpu/3
> > > WORKDIR4=/p/project/chdd22/gromacs/benchmark/AtTPC1/singlegpu/4
> > >
> > > DO_PARALLEL=" srun --exclusive -n 1 --gres=gpu:1 "
> > > EXE=" gmx mdrun "
> > >
> > > cd $WORKDIR1
> > > $DO_PARALLEL $EXE -s eq6.tpr -deffnm eq6-20 -nmpi 1 -pin on -pinoffset
> 0
> > > -ntomp 20 &>log &
> > > cd $WORKDIR2
> > > $DO_PARALLEL $EXE -s eq6.tpr -deffnm eq6-20 -nmpi 1 -pin on -pinoffset
> 10
> > > -ntomp 20 &>log &
> > > cd $WORKDIR3
> > > $DO_PARALLEL $EXE -s eq6.tpr -deffnm eq6-20  -nmpi 1 -pin on -pinoffset
> > 20
> > > -ntomp 20 &>log &
> > > cd $WORKDIR4
> > > $DO_PARALLEL $EXE -s eq6.tpr -deffnm eq6-20 -nmpi 1 -pin on -pinoffset
> 30
> > > -ntomp 20 &>log &
> > >
> > >
> > > Regarding to pinoffset, I first tried using 20 cores for each job but
> > then
> > > also tried with 8

Re: [gmx-users] performance issues running gromacs with more than 1 gpu card in slurm

2019-07-29 Thread Carlos Navarro
Hi Szilárd,
To answer your questions:
**are you trying to run multiple simulations concurrently on the same
node or are you trying to strong-scale?
I'm trying to run multiple simulations on the same node at the same time.

** what are you simulating?
Regular and CompEl simulations

** can you provide log files of the runs?
In the following link are some logs files:
https://www.dropbox.com/s/7q249vbqqwf5r03/Archive.zip?dl=0.
In short, alone.log -> single run in the node (using 1 gpu).
multi1/2/3/4.log ->4 independent simulations ran at the same time in a
single node. In all cases, 20 cpus are used.
Best regards,
Carlos

El jue., 25 jul. 2019 a las 10:59, Szilárd Páll ()
escribió:

> Hi,
>
> It is not clear to me how are you trying to set up your runs, so
> please provide some details:
> - are you trying to run multiple simulations concurrently on the same
> node or are you trying to strong-scale?
> - what are you simulating?
> - can you provide log files of the runs?
>
> Cheers,
>
> --
> Szilárd
>
> On Tue, Jul 23, 2019 at 1:34 AM Carlos Navarro
>  wrote:
> >
> > No one can give me an idea of what can be happening? Or how I can solve
> it?
> > Best regards,
> > Carlos
> >
> > ——
> > Carlos Navarro Retamal
> > Bioinformatic Engineering. PhD.
> > Postdoctoral Researcher in Center of Bioinformatics and Molecular
> > Simulations
> > Universidad de Talca
> > Av. Lircay S/N, Talca, Chile
> > E: carlos.navarr...@gmail.com or cnava...@utalca.cl
> >
> > On July 19, 2019 at 2:20:41 PM, Carlos Navarro (
> carlos.navarr...@gmail.com)
> > wrote:
> >
> > Dear gmx-users,
> > I’m currently working in a server where each node posses 40 physical
> cores
> > (40 threads) and 4 Nvidia-V100.
> > When I launch a single job (1 simulation using a single gpu card) I get a
> > performance of about ~35ns/day in a system of about 300k atoms. Looking
> > into the usage of the video card during the simulation I notice that the
> > card is being used about and ~80%.
> > The problems arise when I increase the number of jobs running at the same
> > time. If for instance 2 jobs are running at the same time, the
> performance
> > drops to ~25ns/day each and the usage of the video cards also drops
> during
> > the simulation to about a ~30-40% (and sometimes dropping to less than
> 5%).
> > Clearly there is a communication problem between the gpu cards and the
> cpu
> > during the simulations, but I don’t know how to solve this.
> > Here is the script I use to run the simulations:
> >
> > #!/bin/bash -x
> > #SBATCH --job-name=testAtTPC1
> > #SBATCH --ntasks-per-node=4
> > #SBATCH --cpus-per-task=20
> > #SBATCH --account=hdd22
> > #SBATCH --nodes=1
> > #SBATCH --mem=0
> > #SBATCH --output=sout.%j
> > #SBATCH --error=s4err.%j
> > #SBATCH --time=00:10:00
> > #SBATCH --partition=develgpus
> > #SBATCH --gres=gpu:4
> >
> > module use /gpfs/software/juwels/otherstages
> > module load Stages/2018b
> > module load Intel/2019.0.117-GCC-7.3.0
> > module load IntelMPI/2019.0.117
> > module load GROMACS/2018.3
> >
> > WORKDIR1=/p/project/chdd22/gromacs/benchmark/AtTPC1/singlegpu/1
> > WORKDIR2=/p/project/chdd22/gromacs/benchmark/AtTPC1/singlegpu/2
> > WORKDIR3=/p/project/chdd22/gromacs/benchmark/AtTPC1/singlegpu/3
> > WORKDIR4=/p/project/chdd22/gromacs/benchmark/AtTPC1/singlegpu/4
> >
> > DO_PARALLEL=" srun --exclusive -n 1 --gres=gpu:1 "
> > EXE=" gmx mdrun "
> >
> > cd $WORKDIR1
> > $DO_PARALLEL $EXE -s eq6.tpr -deffnm eq6-20 -nmpi 1 -pin on -pinoffset 0
> > -ntomp 20 &>log &
> > cd $WORKDIR2
> > $DO_PARALLEL $EXE -s eq6.tpr -deffnm eq6-20 -nmpi 1 -pin on -pinoffset 10
> > -ntomp 20 &>log &
> > cd $WORKDIR3
> > $DO_PARALLEL $EXE -s eq6.tpr -deffnm eq6-20  -nmpi 1 -pin on -pinoffset
> 20
> > -ntomp 20 &>log &
> > cd $WORKDIR4
> > $DO_PARALLEL $EXE -s eq6.tpr -deffnm eq6-20 -nmpi 1 -pin on -pinoffset 30
> > -ntomp 20 &>log &
> >
> >
> > Regarding to pinoffset, I first tried using 20 cores for each job but
> then
> > also tried with 8 cores (so pinoffset 0 for job 1, pinoffset 4 for job 2,
> > pinoffset 8 for job 3 and pinoffset 12 for job) but at the end the
> problem
> > persist.
> >
> > Currently in this machine I’m not able to use more than 1 gpu per job, so
> > this is my only choice to use properly the whole node.
> > If you need more information please just let me know.
> > Best regards.
> > Carlos
> >
> > ——
> > Carlos Navarro Retamal
> > Bioinformatic Engineering. PhD.
> > Postdoctoral Researcher in Center of Bioinformatics and Molecular
> > Simulations
> > Universidad de Talca
> > Av. Lircay S/N, Talca, Chile
> > E: carlos.navarr...@gmail.com or cnava...@utalca.cl
> > --
> > Gromacs Users mailing list
> >
> > * Please search the archive at
> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
> posting!
> >
> > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
> >
> > * For (un)subscribe requests visit
> > https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
> send a mail to gmx-users-requ...@

Re: [gmx-users] performance issues running gromacs with more than 1 gpu card in slurm

2019-07-25 Thread Szilárd Páll
Hi,

It is not clear to me how are you trying to set up your runs, so
please provide some details:
- are you trying to run multiple simulations concurrently on the same
node or are you trying to strong-scale?
- what are you simulating?
- can you provide log files of the runs?

Cheers,

--
Szilárd

On Tue, Jul 23, 2019 at 1:34 AM Carlos Navarro
 wrote:
>
> No one can give me an idea of what can be happening? Or how I can solve it?
> Best regards,
> Carlos
>
> ——
> Carlos Navarro Retamal
> Bioinformatic Engineering. PhD.
> Postdoctoral Researcher in Center of Bioinformatics and Molecular
> Simulations
> Universidad de Talca
> Av. Lircay S/N, Talca, Chile
> E: carlos.navarr...@gmail.com or cnava...@utalca.cl
>
> On July 19, 2019 at 2:20:41 PM, Carlos Navarro (carlos.navarr...@gmail.com)
> wrote:
>
> Dear gmx-users,
> I’m currently working in a server where each node posses 40 physical cores
> (40 threads) and 4 Nvidia-V100.
> When I launch a single job (1 simulation using a single gpu card) I get a
> performance of about ~35ns/day in a system of about 300k atoms. Looking
> into the usage of the video card during the simulation I notice that the
> card is being used about and ~80%.
> The problems arise when I increase the number of jobs running at the same
> time. If for instance 2 jobs are running at the same time, the performance
> drops to ~25ns/day each and the usage of the video cards also drops during
> the simulation to about a ~30-40% (and sometimes dropping to less than 5%).
> Clearly there is a communication problem between the gpu cards and the cpu
> during the simulations, but I don’t know how to solve this.
> Here is the script I use to run the simulations:
>
> #!/bin/bash -x
> #SBATCH --job-name=testAtTPC1
> #SBATCH --ntasks-per-node=4
> #SBATCH --cpus-per-task=20
> #SBATCH --account=hdd22
> #SBATCH --nodes=1
> #SBATCH --mem=0
> #SBATCH --output=sout.%j
> #SBATCH --error=s4err.%j
> #SBATCH --time=00:10:00
> #SBATCH --partition=develgpus
> #SBATCH --gres=gpu:4
>
> module use /gpfs/software/juwels/otherstages
> module load Stages/2018b
> module load Intel/2019.0.117-GCC-7.3.0
> module load IntelMPI/2019.0.117
> module load GROMACS/2018.3
>
> WORKDIR1=/p/project/chdd22/gromacs/benchmark/AtTPC1/singlegpu/1
> WORKDIR2=/p/project/chdd22/gromacs/benchmark/AtTPC1/singlegpu/2
> WORKDIR3=/p/project/chdd22/gromacs/benchmark/AtTPC1/singlegpu/3
> WORKDIR4=/p/project/chdd22/gromacs/benchmark/AtTPC1/singlegpu/4
>
> DO_PARALLEL=" srun --exclusive -n 1 --gres=gpu:1 "
> EXE=" gmx mdrun "
>
> cd $WORKDIR1
> $DO_PARALLEL $EXE -s eq6.tpr -deffnm eq6-20 -nmpi 1 -pin on -pinoffset 0
> -ntomp 20 &>log &
> cd $WORKDIR2
> $DO_PARALLEL $EXE -s eq6.tpr -deffnm eq6-20 -nmpi 1 -pin on -pinoffset 10
> -ntomp 20 &>log &
> cd $WORKDIR3
> $DO_PARALLEL $EXE -s eq6.tpr -deffnm eq6-20  -nmpi 1 -pin on -pinoffset 20
> -ntomp 20 &>log &
> cd $WORKDIR4
> $DO_PARALLEL $EXE -s eq6.tpr -deffnm eq6-20 -nmpi 1 -pin on -pinoffset 30
> -ntomp 20 &>log &
>
>
> Regarding to pinoffset, I first tried using 20 cores for each job but then
> also tried with 8 cores (so pinoffset 0 for job 1, pinoffset 4 for job 2,
> pinoffset 8 for job 3 and pinoffset 12 for job) but at the end the problem
> persist.
>
> Currently in this machine I’m not able to use more than 1 gpu per job, so
> this is my only choice to use properly the whole node.
> If you need more information please just let me know.
> Best regards.
> Carlos
>
> ——
> Carlos Navarro Retamal
> Bioinformatic Engineering. PhD.
> Postdoctoral Researcher in Center of Bioinformatics and Molecular
> Simulations
> Universidad de Talca
> Av. Lircay S/N, Talca, Chile
> E: carlos.navarr...@gmail.com or cnava...@utalca.cl
> --
> Gromacs Users mailing list
>
> * Please search the archive at 
> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting!
>
> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>
> * For (un)subscribe requests visit
> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a 
> mail to gmx-users-requ...@gromacs.org.
-- 
Gromacs Users mailing list

* Please search the archive at 
http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting!

* Can't post? Read http://www.gromacs.org/Support/Mailing_Lists

* For (un)subscribe requests visit
https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a 
mail to gmx-users-requ...@gromacs.org.

Re: [gmx-users] performance issues running gromacs with more than 1 gpu card in slurm

2019-07-22 Thread Carlos Navarro
No one can give me an idea of what can be happening? Or how I can solve it?
Best regards,
Carlos

——
Carlos Navarro Retamal
Bioinformatic Engineering. PhD.
Postdoctoral Researcher in Center of Bioinformatics and Molecular
Simulations
Universidad de Talca
Av. Lircay S/N, Talca, Chile
E: carlos.navarr...@gmail.com or cnava...@utalca.cl

On July 19, 2019 at 2:20:41 PM, Carlos Navarro (carlos.navarr...@gmail.com)
wrote:

Dear gmx-users,
I’m currently working in a server where each node posses 40 physical cores
(40 threads) and 4 Nvidia-V100.
When I launch a single job (1 simulation using a single gpu card) I get a
performance of about ~35ns/day in a system of about 300k atoms. Looking
into the usage of the video card during the simulation I notice that the
card is being used about and ~80%.
The problems arise when I increase the number of jobs running at the same
time. If for instance 2 jobs are running at the same time, the performance
drops to ~25ns/day each and the usage of the video cards also drops during
the simulation to about a ~30-40% (and sometimes dropping to less than 5%).
Clearly there is a communication problem between the gpu cards and the cpu
during the simulations, but I don’t know how to solve this.
Here is the script I use to run the simulations:

#!/bin/bash -x
#SBATCH --job-name=testAtTPC1
#SBATCH --ntasks-per-node=4
#SBATCH --cpus-per-task=20
#SBATCH --account=hdd22
#SBATCH --nodes=1
#SBATCH --mem=0
#SBATCH --output=sout.%j
#SBATCH --error=s4err.%j
#SBATCH --time=00:10:00
#SBATCH --partition=develgpus
#SBATCH --gres=gpu:4

module use /gpfs/software/juwels/otherstages
module load Stages/2018b
module load Intel/2019.0.117-GCC-7.3.0
module load IntelMPI/2019.0.117
module load GROMACS/2018.3

WORKDIR1=/p/project/chdd22/gromacs/benchmark/AtTPC1/singlegpu/1
WORKDIR2=/p/project/chdd22/gromacs/benchmark/AtTPC1/singlegpu/2
WORKDIR3=/p/project/chdd22/gromacs/benchmark/AtTPC1/singlegpu/3
WORKDIR4=/p/project/chdd22/gromacs/benchmark/AtTPC1/singlegpu/4

DO_PARALLEL=" srun --exclusive -n 1 --gres=gpu:1 "
EXE=" gmx mdrun "

cd $WORKDIR1
$DO_PARALLEL $EXE -s eq6.tpr -deffnm eq6-20 -nmpi 1 -pin on -pinoffset 0
-ntomp 20 &>log &
cd $WORKDIR2
$DO_PARALLEL $EXE -s eq6.tpr -deffnm eq6-20 -nmpi 1 -pin on -pinoffset 10
-ntomp 20 &>log &
cd $WORKDIR3
$DO_PARALLEL $EXE -s eq6.tpr -deffnm eq6-20  -nmpi 1 -pin on -pinoffset 20
-ntomp 20 &>log &
cd $WORKDIR4
$DO_PARALLEL $EXE -s eq6.tpr -deffnm eq6-20 -nmpi 1 -pin on -pinoffset 30
-ntomp 20 &>log &


Regarding to pinoffset, I first tried using 20 cores for each job but then
also tried with 8 cores (so pinoffset 0 for job 1, pinoffset 4 for job 2,
pinoffset 8 for job 3 and pinoffset 12 for job) but at the end the problem
persist.

Currently in this machine I’m not able to use more than 1 gpu per job, so
this is my only choice to use properly the whole node.
If you need more information please just let me know.
Best regards.
Carlos

——
Carlos Navarro Retamal
Bioinformatic Engineering. PhD.
Postdoctoral Researcher in Center of Bioinformatics and Molecular
Simulations
Universidad de Talca
Av. Lircay S/N, Talca, Chile
E: carlos.navarr...@gmail.com or cnava...@utalca.cl
-- 
Gromacs Users mailing list

* Please search the archive at 
http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting!

* Can't post? Read http://www.gromacs.org/Support/Mailing_Lists

* For (un)subscribe requests visit
https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a 
mail to gmx-users-requ...@gromacs.org.