[slurm-dev] cpu identifier

2016-09-13 Thread andrealphus

is there an environmental variable available in sbatch that holds the
cpu/s the current job is being run on? (not the number of cpus, but a
cpu identifier).


[slurm-dev] Re: Missing user in sreport

2016-09-13 Thread Paddy Doyle

Hmm, aside from a possible timeout issue (I think it does take a while for
accounting info to be sent to the dbd -- is it sent every hour??), I wonder if
the test jobs you ran were definitely running in that account? The fact that you
used 'DefaultAccount=' when creating the association does suggest that the
account should have been in use though.

If you run 'scontrol show job N' when the job is running (or pending), does
it include 'Account=testgroup' in the output?

Paddy

On Mon, Sep 12, 2016 at 11:50:56PM -0700, Eneko Anasagasti wrote:

> Hi,
> 
> Thanks for your answer Paddy. Actually you were right about the user not
> been in a slurm account.
> 
> So I added it with the following comand
> 
> /sacctmgr -i add user testuser DefaultAccount=testgroup FairShare=100/
> 
> And now it will show up when listing associations but it still doesn't show
> in sreport...
> 
> I'm trying this:
> 
> /sreport  cluster AccountUtilizationByUser Start=2016-03-01-00:00:00
> End=2016-09-12-23:59:59 -t percent/
> 
> (I sent slurm jobs myself using this user to try to make them show in the
> report)
> 
> Thanks,
> 
> 
> Eneko Anasagasti
> IT Engineer
> *BCAM -* Basque Center for Applied Mathematics
> Alameda de Mazarredo, 14
> E-48009 Bilbao, Basque Country - Spain
> Tel. +34 946 567 842
> eanasaga...@bcamath.org  |
> www.bcamath.org/eanasagasti 
> */
> /*
> */(/*///matematika mugaz bestalde *)
> 
> */
> On 09/09/16 10:17, Paddy Doyle wrote:
> >Hi Eneko,
> >
> >On Fri, Sep 09, 2016 at 12:12:32AM -0700, Eneko Anasagasti wrote:
> >
> >>Hi,
> >>
> >>I just realized a user is missing from sreport, although it is perfectly
> >>visible with sacct.
> >>
> >>We are using slurm 15.8.7-1.1
> >>
> >>After doing a small analysis I found out that this user wasn't a member of
> >>the group object it should be in openldap.
> >>
> >>So I add it where it belonged. But still this wasn't enough apparently.
> >I don't think that slurm would interact directly with LDAP. It's more likely
> >that the user is not in a slurm Account, as managed by sacctmgr. If your 
> >sreport
> >is something like "sreport cluster AccountUtilizationByUser..." then I think
> >users need to be in a slurm Account/Association before they are visible in 
> >that
> >report. Does the user show up in "sacctmgr list associations cluster=XX"?
> >
> >As a wild guess, do you have a process that looks at a user's group 
> >membership
> >in LDAP and then adds them to a slurm Account? Perhaps that hadn't run yet 
> >for
> >the user?
> >
> >>I also tried using slurmreport but I am getting the following error
> >>
> >>/Can't locate Slurm.pm in @INC/
> >I don't know what 'slurmreport' is, but that looks like a Perl include path
> >error. Maybe you need to install the slurm-perlapi package?
> >
> >Thanks,
> >Paddy
> >
> 

-- 
Paddy Doyle
Trinity Centre for High Performance Computing,
Lloyd Building, Trinity College Dublin, Dublin 2, Ireland.
Phone: +353-1-896-3725
http://www.tchpc.tcd.ie/


[slurm-dev] Re: Configuring slurm to use all CPUs on a node

2016-09-13 Thread Jason Bacon


On 09/12/16 20:12, Christopher Samuel wrote:

On 13/09/16 02:58, andrealphus wrote:


It doesnt seem like changing it to a different resource allocation
method makes a difference, and almost seems buggy to me, but I guess
is just a quirk of multithread systems.

Yeah - a hyperthread is not a core and so running multiple jobs on the
same core will usually hurt performance, it maybe better to let the
application use all threading units on the core instead (assuming it's
multithreaded).

However, this all depends on your application (and if it's single
threaded you'll never get an improvement) so I'd suggest benchmarking
with your current configuration versus disabling HT and running on real
cores only.

Basically, whichever gets you better throughput should be your default
config.

All the best,
Chris


In the past I've run 3dDeconvolves on Pentium 4s with hyperthreading and 
got great results.  The performance was equivalent to about 1.8 real cores.


More recently we did some tests with NCBI BLAST and got zero gain from 
hyperthreading.  ( Same run time on a 16 HT core machine using 16 or 32 
processes. )


We've also found that it complicates CPU affinity, so we ultimately 
disabled it at the BIOS level.


JB


[slurm-dev] spank plugin roll out on a cluster

2016-09-13 Thread Hendryk Bockelmann

Hello,

I was wondering what is the best/correct way to roll out a new version 
of a spank plugin (ie. *.so file) on the cluster - while keeping the 
production running.
We used to submit as root one job per node with high priority that 
copies the stuff from some global lustre directory to node local 
/etc/slurm. Unfortunately, this causes the job to get stuck in running 
state and finally reaching the time limit. In slurmd.log one can see a 
message like


error: _step_connect: connect() failed dir /var/log/slurm/spool_slurmd/ 
node m10037 job 3849566 step -2 Connection refused
_handle_stray_script: Purging vestigial job script 
/var/log/slurm/spool_slurmd//job3849566/slurm_script


Hence, this does not seem to be a good idea ...
Any suggestion to do better?

Thanks, Hendryk



smime.p7s
Description: S/MIME Cryptographic Signature


[slurm-dev] Preemption, signals and Gracetime

2016-09-13 Thread Near-Ansari, Naveed
We are setting up preemption using QOS on our cluster.  Documentation seems to 
say that when a job is preempted it should be getting a SIGCONT and SOGTERM 
when selected for preemption, and then a SIGTERM, SIGCONT, AND SIGKILL at the 
end of gracetime.

We have checked all of this and we are sent the signals at the end of 
GraceTime, but not when selected for preemption.  We are listening for these 
signals to checkpoint when preempted. We are checking for the signals both in 
the script and the launched executables in case we are wrong about what catches 
the signals.

I am unclear whether the problem is of my understanding of how it is supposed 
to work, my configuration, or the documentation.

The doc that mentions the signals sent it http://slurm.schedmd.com/preempt.html.

This is the qos setup:

  Name  GraceTimePreempt PreemptMode 
-- -- -- --- 
normal   00:00:00cluster 
sxs-lo   00:20:00 cancel 
sxs-hi   00:20:00 sxs-lo  cancel 


/etc/slurm/slurm.conf:

…
PreemptType=preempt/qos
PreemptMode=CANCEL
…


What am I doing wrong on this?

Thanks,

Naveed 



[slurm-dev] Re: Configuring slurm to use all CPUs on a node

2016-09-13 Thread Benjamin Redling

On 09/12/2016 18:57, andrealphus wrote:
> It doesnt seem like changing it to a different resource allocation
> method makes a difference, and almost seems buggy to me, but I guess
> is just a quirk of multithread systems.

Your issue ("using all hyperthreads") was discussed multiple times on
the list in the not so distant past.

The resource allocation method alone won't make it:
http://slurm.schedmd.com/faq.html#cpu_count

Anyway, I think you are on the right track.
BR
-- 
FSU Jena | JULIELab.de/Staff/Benjamin+Redling.html
vox: +49 3641 9 44323 | fax: +49 3641 9 44321


[slurm-dev] Re: Configuring slurm to use all CPUs on a node

2016-09-13 Thread Benjamin Redling

On 09/12/2016 16:55, Uwe Sauter wrote:
> 
> Also. CPUs=32 is wrong. You need
> 
> Sockets=2 CoresPerSocket=8 ThreadsPerCore=2

Setting "CPU" is not wrong according to the FAQ:
http://slurm.schedmd.com/faq.html#cpu_count

BR
-- 
FSU Jena | JULIELab.de/Staff/Benjamin+Redling.html
vox: +49 3641 9 44323 | fax: +49 3641 9 44321


[slurm-dev] Re: Configuring slurm to use all CPUs on a node

2016-09-13 Thread Benjamin Redling



On 09/12/2016 16:48, Uwe Sauter wrote:
> 
> Try SelectTypeParameters=CR_Core instead of CR_CPU

That alone is not sufficient:
http://slurm.schedmd.com/faq.html#cpu_count

BR
-- 
FSU Jena | JULIELab.de/Staff/Benjamin+Redling.html
vox: +49 3641 9 44323 | fax: +49 3641 9 44321


[slurm-dev] Re: Configuring slurm to use all CPUs on a node

2016-09-13 Thread Barbara Krasovec

Hello!


On 09/12/2016 06:57 PM, andrealphus wrote:
> 
> For future users reference, if you set;
> 
> Sockets=2 CoresPerSocket=8 ThreadsPerCore=2
> 
> or similarly;
> 
> CPUs=32 Sockets=2 CoresPerSocket=8 ThreadsPerCore=2
> 
> Youre queue will show 32 cpus availlable (e.g. sinfo -o %C), but for
> some reason the lowest available allocation level will be by core, and
> hence any job you run will automatically allocate a whole core (2
> CPUs) per job, and you will be limited to 16 jobs.

It also works, if you put:
Sockets=2 CoresPerSocket=16 ThreadsPerCore=1

Cheers,
Barbara