[slurm-dev] Re: slurm-dev Memory accounting issues with mpirun (was Re: Open-MPI build of NAMD launched from srun over 20% slowed than with mpirun)

2013-08-07 Thread Christopher Samuel

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 07/08/13 16:19, Christopher Samuel wrote:

 Anyone seen anything similar, or any ideas on what could be going
 on?

Sorry, this was with:

# ACCOUNTING
JobAcctGatherType=jobacct_gather/linux
JobAcctGatherFrequency=30

Since those initial tests we've started enforcing memory limits (the
system is not yet in full production) and found that this causes jobs
to get killed.

We tried the cgroups gathering method, but jobs still die with mpirun
and now the numbers don't seem to right for mpirun or srun either:

mpirun (killed):

[samuel@barcoo-test Mem]$ sacct -j 94564 -o JobID,MaxRSS,MaxVMSize
   JobID MaxRSS  MaxVMSize
-  -- --
94564
94564.batch-523362K  0
94564.0 394525K  0

srun:

[samuel@barcoo-test Mem]$ sacct -j 94565 -o JobID,MaxRSS,MaxVMSize
   JobID MaxRSS  MaxVMSize
-  -- --
94565
94565.batch998K  0
94565.0  88663K  0


All the best,
Chris
- -- 
 Christopher SamuelSenior Systems Administrator
 VLSCI - Victorian Life Sciences Computation Initiative
 Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
 http://www.vlsci.org.au/  http://twitter.com/vlsci

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iEYEARECAAYFAlIB73wACgkQO2KABBYQAh+kwACfYnMbONcpxD2lsM5i4QDw5r93
KpMAn2hPUxMJ62u2gZIUGl5I0bQ6lllk
=jYrC
-END PGP SIGNATURE-


[slurm-dev] Re: slurm-dev Memory accounting issues with mpirun (was Re: Open-MPI build of NAMD launched from srun over 20% slowed than with mpirun)

2013-08-07 Thread Janne Blomqvist


On 2013-08-07 09:19, Christopher Samuel wrote:


-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 23/07/13 17:06, Christopher Samuel wrote:


Bringing up a new IBM SandyBridge cluster I'm running a NAMD test
case and noticed that if I run it with srun rather than mpirun it
goes over 20% slower.


Following on from this issue, we've found that whilst mpirun gives
acceptable performance the memory accounting doesn't appear to be correct.

Anyone seen anything similar, or any ideas on what could be going on?


See my message from yesterday

https://groups.google.com/d/msg/slurm-devel/BlZ2-NwwCCg/03DnMEWYHqUJ

for what I think is the reason. That is, the memory accounting is per 
task, and when launching using mpirun the number of tasks does not 
correspond to the number of MPI processes, but rather to the number of 
orted processes (1 per node).




Here are two identical NAMD jobs running over 69 nodes using 16 nodes
per core, this one launched with mpirun (Open-MPI 1.6.5):


== slurm-94491.out ==
WallClock: 101.176193  CPUTime: 101.176193  Memory: 1268.554688 MB
End of program

[samuel@barcoo-test Mem]$ sacct -j 94491 -o JobID,MaxRSS,MaxVMSize
JobID MaxRSS  MaxVMSize
-  -- --
94491
94491.batch6504068K  11167820K
94491.05952048K   9028060K


This one launched with srun (about 60% slower):

== slurm-94505.out ==
WallClock: 163.314163  CPUTime: 163.314163  Memory: 1253.511719 MB
End of program

[samuel@barcoo-test Mem]$ sacct -j 94505 -o JobID,MaxRSS,MaxVMSize
JobID MaxRSS  MaxVMSize
-  -- --
94505
94505.batch   7248K   1582692K
94505.01022744K   1307112K



cheers!
Chris
- --
  Christopher SamuelSenior Systems Administrator
  VLSCI - Victorian Life Sciences Computation Initiative
  Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
  http://www.vlsci.org.au/  http://twitter.com/vlsci

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iEYEARECAAYFAlIB5sEACgkQO2KABBYQAh9QMQCfQ57w0YqVDwgyGRqUe3dSvQDj
e9cAnRRx/kDNUNqUCuFGY87mXf2fMOr+
=JUPK
-END PGP SIGNATURE-




--
Janne Blomqvist, D.Sc. (Tech.), Scientific Computing Specialist
Aalto University School of Science, PHYS  BECS
+358503841576 || janne.blomqv...@aalto.fi


[slurm-dev] Re: slurm-dev Memory accounting issues with mpirun (was Re: Open-MPI build of NAMD launched from srun over 20% slowed than with mpirun)

2013-08-07 Thread Christopher Samuel

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 07/08/13 16:59, Janne Blomqvist wrote:

 That is, the memory accounting is per task, and when launching
 using mpirun the number of tasks does not correspond to the number
 of MPI processes, but rather to the number of orted processes (1
 per node).

That appears to be correct, I am seeing 1 task in the batch and 68
tasks for orted when I use mpirun whilst I see 1 task in the batch and
1104 tasks as namd2 when I use srun.

I could understand how that might result in Slurm (wrongly) thinking
that a single task is using more than its allowed memory per tasks,
but I'm not sure I understand how that could lead to Slurm thinking
the job is using vastly more memory than it actually is though.


cheers,
Chris
- -- 
 Christopher SamuelSenior Systems Administrator
 VLSCI - Victorian Life Sciences Computation Initiative
 Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
 http://www.vlsci.org.au/  http://twitter.com/vlsci

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iEYEARECAAYFAlIB+lgACgkQO2KABBYQAh8uqgCdGuA03jCEdJVJE2dJGBHEJjb/
WY4An3em/48L25xq4Ui/GHijSJY2Oo6T
=Zk4G
-END PGP SIGNATURE-


[slurm-dev] Re: slurm-dev Memory accounting issues with mpirun (was Re: Open-MPI build of NAMD launched from srun over 20% slowed than with mpirun)

2013-08-07 Thread Danny Auble


Just a note, if srun isn't used to launch a task the odds of accounting 
for the step being correct are very low.  Using srun is the only known 
way to always guarantee accounting for steps to be accurate.  This also 
goes for handling memory limits.


Danny

On 08/07/13 00:43, Christopher Samuel wrote:

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 07/08/13 16:59, Janne Blomqvist wrote:


That is, the memory accounting is per task, and when launching
using mpirun the number of tasks does not correspond to the number
of MPI processes, but rather to the number of orted processes (1
per node).

That appears to be correct, I am seeing 1 task in the batch and 68
tasks for orted when I use mpirun whilst I see 1 task in the batch and
1104 tasks as namd2 when I use srun.

I could understand how that might result in Slurm (wrongly) thinking
that a single task is using more than its allowed memory per tasks,
but I'm not sure I understand how that could lead to Slurm thinking
the job is using vastly more memory than it actually is though.


cheers,
Chris
- -- 
  Christopher SamuelSenior Systems Administrator

  VLSCI - Victorian Life Sciences Computation Initiative
  Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
  http://www.vlsci.org.au/  http://twitter.com/vlsci

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iEYEARECAAYFAlIB+lgACgkQO2KABBYQAh8uqgCdGuA03jCEdJVJE2dJGBHEJjb/
WY4An3em/48L25xq4Ui/GHijSJY2Oo6T
=Zk4G
-END PGP SIGNATURE-


[slurm-dev] Re: slurm-dev Memory accounting issues with mpirun (was Re: Open-MPI build of NAMD launched from srun over 20% slowed than with mpirun)

2013-08-07 Thread Christopher Samuel

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Hi Danny,

On 08/08/13 04:08, Danny Auble wrote:

 Just a note, if srun isn't used to launch a task the odds of
 accounting for the step being correct are very low.  Using srun is
 the only known way to always guarantee accounting for steps to be
 accurate.  This also goes for handling memory limits.

Thanks, and I understand why that is, it's just a shame that the
performance penalty for using srun with Open-MPI makes it unusable. :-(

cheers!
Chris
- -- 
 Christopher SamuelSenior Systems Administrator
 VLSCI - Victorian Life Sciences Computation Initiative
 Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
 http://www.vlsci.org.au/  http://twitter.com/vlsci

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iEYEARECAAYFAlIDGgsACgkQO2KABBYQAh8fhgCeIMPlkusK2JD3Ns11W8gq1gAx
0aIAn0uGgyRKVNJvmDcf11ZTOqZJobcn
=yg4K
-END PGP SIGNATURE-