[slurm-dev] slurm-dev Memory accounting issues with mpirun (was Re: Open-MPI build of NAMD launched from srun over 20% slowed than with mpirun)

2013-08-07 Thread Christopher Samuel

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 23/07/13 17:06, Christopher Samuel wrote:

 Bringing up a new IBM SandyBridge cluster I'm running a NAMD test 
 case and noticed that if I run it with srun rather than mpirun it 
 goes over 20% slower.

Following on from this issue, we've found that whilst mpirun gives
acceptable performance the memory accounting doesn't appear to be correct.

Anyone seen anything similar, or any ideas on what could be going on?

Here are two identical NAMD jobs running over 69 nodes using 16 nodes
per core, this one launched with mpirun (Open-MPI 1.6.5):


== slurm-94491.out ==
WallClock: 101.176193  CPUTime: 101.176193  Memory: 1268.554688 MB
End of program

[samuel@barcoo-test Mem]$ sacct -j 94491 -o JobID,MaxRSS,MaxVMSize
   JobID MaxRSS  MaxVMSize
-  -- --
94491
94491.batch6504068K  11167820K
94491.05952048K   9028060K


This one launched with srun (about 60% slower):

== slurm-94505.out ==
WallClock: 163.314163  CPUTime: 163.314163  Memory: 1253.511719 MB
End of program

[samuel@barcoo-test Mem]$ sacct -j 94505 -o JobID,MaxRSS,MaxVMSize
   JobID MaxRSS  MaxVMSize
-  -- --
94505
94505.batch   7248K   1582692K
94505.01022744K   1307112K



cheers!
Chris
- -- 
 Christopher SamuelSenior Systems Administrator
 VLSCI - Victorian Life Sciences Computation Initiative
 Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
 http://www.vlsci.org.au/  http://twitter.com/vlsci

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iEYEARECAAYFAlIB5sEACgkQO2KABBYQAh9QMQCfQ57w0YqVDwgyGRqUe3dSvQDj
e9cAnRRx/kDNUNqUCuFGY87mXf2fMOr+
=JUPK
-END PGP SIGNATURE-


[slurm-dev] Re: slurm-dev Memory accounting issues with mpirun (was Re: Open-MPI build of NAMD launched from srun over 20% slowed than with mpirun)

2013-08-07 Thread Christopher Samuel

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 07/08/13 16:19, Christopher Samuel wrote:

 Anyone seen anything similar, or any ideas on what could be going
 on?

Sorry, this was with:

# ACCOUNTING
JobAcctGatherType=jobacct_gather/linux
JobAcctGatherFrequency=30

Since those initial tests we've started enforcing memory limits (the
system is not yet in full production) and found that this causes jobs
to get killed.

We tried the cgroups gathering method, but jobs still die with mpirun
and now the numbers don't seem to right for mpirun or srun either:

mpirun (killed):

[samuel@barcoo-test Mem]$ sacct -j 94564 -o JobID,MaxRSS,MaxVMSize
   JobID MaxRSS  MaxVMSize
-  -- --
94564
94564.batch-523362K  0
94564.0 394525K  0

srun:

[samuel@barcoo-test Mem]$ sacct -j 94565 -o JobID,MaxRSS,MaxVMSize
   JobID MaxRSS  MaxVMSize
-  -- --
94565
94565.batch998K  0
94565.0  88663K  0


All the best,
Chris
- -- 
 Christopher SamuelSenior Systems Administrator
 VLSCI - Victorian Life Sciences Computation Initiative
 Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
 http://www.vlsci.org.au/  http://twitter.com/vlsci

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iEYEARECAAYFAlIB73wACgkQO2KABBYQAh+kwACfYnMbONcpxD2lsM5i4QDw5r93
KpMAn2hPUxMJ62u2gZIUGl5I0bQ6lllk
=jYrC
-END PGP SIGNATURE-


[slurm-dev] Re: slurm-dev Memory accounting issues with mpirun (was Re: Open-MPI build of NAMD launched from srun over 20% slowed than with mpirun)

2013-08-07 Thread Janne Blomqvist


On 2013-08-07 09:19, Christopher Samuel wrote:


-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 23/07/13 17:06, Christopher Samuel wrote:


Bringing up a new IBM SandyBridge cluster I'm running a NAMD test
case and noticed that if I run it with srun rather than mpirun it
goes over 20% slower.


Following on from this issue, we've found that whilst mpirun gives
acceptable performance the memory accounting doesn't appear to be correct.

Anyone seen anything similar, or any ideas on what could be going on?


See my message from yesterday

https://groups.google.com/d/msg/slurm-devel/BlZ2-NwwCCg/03DnMEWYHqUJ

for what I think is the reason. That is, the memory accounting is per 
task, and when launching using mpirun the number of tasks does not 
correspond to the number of MPI processes, but rather to the number of 
orted processes (1 per node).




Here are two identical NAMD jobs running over 69 nodes using 16 nodes
per core, this one launched with mpirun (Open-MPI 1.6.5):


== slurm-94491.out ==
WallClock: 101.176193  CPUTime: 101.176193  Memory: 1268.554688 MB
End of program

[samuel@barcoo-test Mem]$ sacct -j 94491 -o JobID,MaxRSS,MaxVMSize
JobID MaxRSS  MaxVMSize
-  -- --
94491
94491.batch6504068K  11167820K
94491.05952048K   9028060K


This one launched with srun (about 60% slower):

== slurm-94505.out ==
WallClock: 163.314163  CPUTime: 163.314163  Memory: 1253.511719 MB
End of program

[samuel@barcoo-test Mem]$ sacct -j 94505 -o JobID,MaxRSS,MaxVMSize
JobID MaxRSS  MaxVMSize
-  -- --
94505
94505.batch   7248K   1582692K
94505.01022744K   1307112K



cheers!
Chris
- --
  Christopher SamuelSenior Systems Administrator
  VLSCI - Victorian Life Sciences Computation Initiative
  Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
  http://www.vlsci.org.au/  http://twitter.com/vlsci

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iEYEARECAAYFAlIB5sEACgkQO2KABBYQAh9QMQCfQ57w0YqVDwgyGRqUe3dSvQDj
e9cAnRRx/kDNUNqUCuFGY87mXf2fMOr+
=JUPK
-END PGP SIGNATURE-




--
Janne Blomqvist, D.Sc. (Tech.), Scientific Computing Specialist
Aalto University School of Science, PHYS  BECS
+358503841576 || janne.blomqv...@aalto.fi


[slurm-dev] Re: slurm-dev Memory accounting issues with mpirun (was Re: Open-MPI build of NAMD launched from srun over 20% slowed than with mpirun)

2013-08-07 Thread Christopher Samuel

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 07/08/13 16:59, Janne Blomqvist wrote:

 That is, the memory accounting is per task, and when launching
 using mpirun the number of tasks does not correspond to the number
 of MPI processes, but rather to the number of orted processes (1
 per node).

That appears to be correct, I am seeing 1 task in the batch and 68
tasks for orted when I use mpirun whilst I see 1 task in the batch and
1104 tasks as namd2 when I use srun.

I could understand how that might result in Slurm (wrongly) thinking
that a single task is using more than its allowed memory per tasks,
but I'm not sure I understand how that could lead to Slurm thinking
the job is using vastly more memory than it actually is though.


cheers,
Chris
- -- 
 Christopher SamuelSenior Systems Administrator
 VLSCI - Victorian Life Sciences Computation Initiative
 Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
 http://www.vlsci.org.au/  http://twitter.com/vlsci

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iEYEARECAAYFAlIB+lgACgkQO2KABBYQAh8uqgCdGuA03jCEdJVJE2dJGBHEJjb/
WY4An3em/48L25xq4Ui/GHijSJY2Oo6T
=Zk4G
-END PGP SIGNATURE-


[slurm-dev] Re: 2.6.0 html documentation says CR_Core_Memory Not yet implemented

2013-08-07 Thread Moe Jette


That was old documentation. We'll fix that with the next web page update.

Quoting Jeff Tan jeffe...@au1.ibm.com:




Dear SchedMD,

Just verifying what might be no more than a missed docupdate: is
CR_Core_Memory definitely implemented in 2.6.0? The cons_res.html that
comes with it says it isn't, but it seems to work when activated.

Regards
Jeff


Dr. Jeff Tan

High Performance Computing Specialist
IBM Research Collaboratory for Life Sciences (Melbourne, Australia)
Phone: +61 3 90 354392




[slurm-dev] Re: cons_res: Can't use Partition SelectType

2013-08-07 Thread Eva Hocks




Hi Magnus,

Thanks for the reply. I am using

slurm.conf:
SelectTypeParameter=CR_CPU_Memory

to manage memory on the nodes. I am using

partition.conf:
SelectTypeParameter=CR_Core

in one partition to allow gpu jobs to run without memory problems.

The documentation states that is a valid set up but the log shows it's
not implemented?

How do I monitor memory in all but one partition?

Thanks
Eva

On Wed, 7 Aug 2013, Magnus Jonsson wrote:


 Hi!

 To use Partition SelectTypeParameter you must use CR_Socket or (CR_Core
 and CR_ALLOCATE_FULL_SOCKET) as the default SelectTypeParameter.

 You are using CR_CPU_Memory.

 Best regards,
 Magnus

 On 2013-08-05 23:13, Eva Hocks wrote:
 
 
 
  I am getting spam messages in the logs:
 
  [2013-08-05T14:04:32.000] cons_res: Can't use Partition SelectType
  unless using CR_Socket or CR_Core and CR_ALLOCATE_FULL_SOCKET
 
 
 
  The slurm.conf settings are:
 
  SelectType=select/cons_res
  SelectTypeParameters=CR_CPU_Memory
 
 
 
  and I have set one partition in partitions.conf to
  SelectTypeParameters=CR_Core
 
 
  Why does slurm complain? I HAVE set CR_Core and I even checked the
  spelling.
 
 
  Thanks
  Eva
 



[slurm-dev] Re: slurm-dev Memory accounting issues with mpirun (was Re: Open-MPI build of NAMD launched from srun over 20% slowed than with mpirun)

2013-08-07 Thread Danny Auble


Just a note, if srun isn't used to launch a task the odds of accounting 
for the step being correct are very low.  Using srun is the only known 
way to always guarantee accounting for steps to be accurate.  This also 
goes for handling memory limits.


Danny

On 08/07/13 00:43, Christopher Samuel wrote:

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 07/08/13 16:59, Janne Blomqvist wrote:


That is, the memory accounting is per task, and when launching
using mpirun the number of tasks does not correspond to the number
of MPI processes, but rather to the number of orted processes (1
per node).

That appears to be correct, I am seeing 1 task in the batch and 68
tasks for orted when I use mpirun whilst I see 1 task in the batch and
1104 tasks as namd2 when I use srun.

I could understand how that might result in Slurm (wrongly) thinking
that a single task is using more than its allowed memory per tasks,
but I'm not sure I understand how that could lead to Slurm thinking
the job is using vastly more memory than it actually is though.


cheers,
Chris
- -- 
  Christopher SamuelSenior Systems Administrator

  VLSCI - Victorian Life Sciences Computation Initiative
  Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
  http://www.vlsci.org.au/  http://twitter.com/vlsci

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iEYEARECAAYFAlIB+lgACgkQO2KABBYQAh8uqgCdGuA03jCEdJVJE2dJGBHEJjb/
WY4An3em/48L25xq4Ui/GHijSJY2Oo6T
=Zk4G
-END PGP SIGNATURE-


[slurm-dev] Re: Jobs not queued in SLURM 2.3

2013-08-07 Thread José Manuel Molero
Hi Carles,
Thanks for your reply.
I dont see any error in the logs files. I use -v flag and there arent errors.
Thanks for your explain about backfillI used backfilling because it was the 
default option, but I had to eliminate the walltime limits due to some user 
complaints.
This strange behaviour only affect a one specific user. He can send several 
jobs to the queue (and are dispatched and work fine), but seems like there was 
a limit in the number of running jobs or resources for this user, after that, 
his works arent dispatched (dont appear) but the other works are running, also 
he can submit other jobs using less resources. Although there are enough free 
resources.
Thanks in advance.
Regards.

Date: Wed, 7 Aug 2013 01:13:44 -0700
From: mini...@gmail.com
To: slurm-dev@schedmd.com
Subject: [slurm-dev] Re: Jobs not queued in SLURM 2.3

 
 

Re: [slurm-dev] Jobs not queued in SLURM 2.3

 
 
Hi José Manuel,

Do you see any error on the controller logfile?

Having backfilling with ulimited time jobs is useless because it will never 
backfill a job, but should not affect on the non appearing jobs issue.


Are any of the user jobs dispatched?

Regards,
Carles Fenoy
Barcelona Supercomputing Center


On Wed, Aug 7, 2013 at 9:57 AM, José Manuel Molero jml...@hotmail.com wrote:




Hi,
I have a cluster working fine with Slurm 2.3
But I noticed that a specific user can not queue (or dispatch) many works at 
the same time.When this user queue many jobs, the following jobs are submitted 
correctly (give a job number) but not added to the queue. And the cluster have 
free resources.

https://computing.llnl.gov/linux/slurm/faq.html#time
This is not the problem, because other user can allocate jobs without problems.

The scheduler type si backfill, and all the jobs are time unlimited. Could this 
be the problem?Why the jobs dissapear (not added to the queue) and not passed 
to PD state if there are free resources?

Thanks in advance.
Best regards.
  



-- 
--
Carles Fenoy



  

[slurm-dev] Increase user priority on a specific partition

2013-08-07 Thread Neil Van Lysel

Hello,

Is there a way to increase a users priority on a specific partition 
without using qos?
When this user runs jobs on this partition I want their jobs to have a 
greater priority than all of the other jobs on this partition.


Thanks for your help,
Neil Van Lysel



smime.p7s
Description: S/MIME Cryptographic Signature


[slurm-dev] Re: slurm-dev Memory accounting issues with mpirun (was Re: Open-MPI build of NAMD launched from srun over 20% slowed than with mpirun)

2013-08-07 Thread Christopher Samuel

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Hi Danny,

On 08/08/13 04:08, Danny Auble wrote:

 Just a note, if srun isn't used to launch a task the odds of
 accounting for the step being correct are very low.  Using srun is
 the only known way to always guarantee accounting for steps to be
 accurate.  This also goes for handling memory limits.

Thanks, and I understand why that is, it's just a shame that the
performance penalty for using srun with Open-MPI makes it unusable. :-(

cheers!
Chris
- -- 
 Christopher SamuelSenior Systems Administrator
 VLSCI - Victorian Life Sciences Computation Initiative
 Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
 http://www.vlsci.org.au/  http://twitter.com/vlsci

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iEYEARECAAYFAlIDGgsACgkQO2KABBYQAh8fhgCeIMPlkusK2JD3Ns11W8gq1gAx
0aIAn0uGgyRKVNJvmDcf11ZTOqZJobcn
=yg4K
-END PGP SIGNATURE-