[slurm-dev] Re: Fixing corrupted slurm accounting?

2017-10-28 Thread Douglas Jacobsen
A more complete response would be something like:

MariaDB [slurm_acct_db]> select * from _last_ran_table;
+---+--++
| hourly_rollup | daily_rollup | monthly_rollup |
+---+--++
|1509206400 |   1509174000 | 1506841200 |
+---+--++
1 row in set (0.00 sec)

MariaDB [slurm_acct_db]> update _last_ran_table set
hourly_rollup=UNIX_TIMESTAMP('2017-01-01
00:00:00'),daily_rollup=UNIX_TIMESTAMP('2017-01-01
00:00:00'),monthly_rollup=UNIX_TIMESTAMP('2017-01-01 00:00:00');
Query OK, 1 row affected (0.05 sec)
Rows matched: 1  Changed: 1  Warnings: 0

MariaDB [alva_slurm_acct_db]> select * from _last_ran_table;
+---+--++
| hourly_rollup | daily_rollup | monthly_rollup |
+---+--++
|1483257600 |   1483257600 | 1483257600 |
+---+--++
1 row in set (0.01 sec)

MariaDB [slurm_acct_db]> quit

Making changes to the timestamps and "" as appropriate.

Obviously mucking with the database is dangerous, so be careful.


Doug Jacobsen, Ph.D.
NERSC Computer Systems Engineer
National Energy Research Scientific Computing Center <http://www.nersc.gov>
dmjacob...@lbl.gov

- __o
-- _ '\<,_
--(_)/  (_)__


On Sat, Oct 28, 2017 at 9:17 AM, Douglas Jacobsen <dmjacob...@lbl.gov>
wrote:

> Once you've got the end times fixed, youll need to manually update the
> timestamps in the _last_ran table to some time point before the
> start of the earliest job fixed.  Then on the next hour mark, it'll start
> rerolling up the past data to reflect the new reality you've set in the
> database.
>
> Unfortunately I'm away from a keyboard right now so I'm not 100% certain
> of the table name.
>
> On Oct 28, 2017 09:09, "Doug Meyer" <dameye...@gmail.com> wrote:
>
>> Look up orphan jobs and lost.pl (quick script to find orphans) in
>> https://groups.google.com/forum/#!forum/slurm-devel.
>>
>> Battling this myself right now.
>>
>> Thank you,
>> Doug
>>
>> On Fri, Oct 27, 2017 at 9:00 PM, Bill Broadley <b...@cse.ucdavis.edu>
>> wrote:
>>
>>>
>>>
>>> I noticed crazy high numbers in my reports, things like sreport user top:
>>> Top 10 Users 2017-10-20T00:00:00 - 2017-10-26T23:59:59 (604800 secs)
>>> Use reported in Percentage of Total
>>> 
>>> 
>>>   Cluster Login Proper Name AccountUsed   Energy
>>> - - --- --- ---
>>> 
>>> MyClust   JoeUser   Joe User jgrp   3710.15%0.00%
>>>
>>> This was during a period when JoeUser hadn't submitted a single job.
>>>
>>> We have been through some slurm upgrades, figured one of the schema
>>> tweaks had
>>> confused things.  I looked in the slurm accounting table and found the
>>> job_table.  I found 80,000 jobs with no end_time, that weren't actually
>>> running.
>>>  So I set the end_time = begin time for those 80,000 jobs.  It didn't
>>> help the
>>> reports.
>>>
>>> I then tried deleting all 80,000 jobs from the job_table and that didn't
>>> help
>>> either.
>>>
>>> Is there a way to rebuild the accounting data from the information in
>>> the job_
>>> table?
>>>
>>> Or any other suggestion for getting some sane numbers out?
>>>
>>
>>


[slurm-dev] Re: Fixing corrupted slurm accounting?

2017-10-28 Thread Douglas Jacobsen
Once you've got the end times fixed, youll need to manually update the
timestamps in the _last_ran table to some time point before the
start of the earliest job fixed.  Then on the next hour mark, it'll start
rerolling up the past data to reflect the new reality you've set in the
database.

Unfortunately I'm away from a keyboard right now so I'm not 100% certain of
the table name.

On Oct 28, 2017 09:09, "Doug Meyer"  wrote:

> Look up orphan jobs and lost.pl (quick script to find orphans) in
> https://groups.google.com/forum/#!forum/slurm-devel.
>
> Battling this myself right now.
>
> Thank you,
> Doug
>
> On Fri, Oct 27, 2017 at 9:00 PM, Bill Broadley 
> wrote:
>
>>
>>
>> I noticed crazy high numbers in my reports, things like sreport user top:
>> Top 10 Users 2017-10-20T00:00:00 - 2017-10-26T23:59:59 (604800 secs)
>> Use reported in Percentage of Total
>> 
>> 
>>   Cluster Login Proper Name AccountUsed   Energy
>> - - --- --- ---
>> 
>> MyClust   JoeUser   Joe User jgrp   3710.15%0.00%
>>
>> This was during a period when JoeUser hadn't submitted a single job.
>>
>> We have been through some slurm upgrades, figured one of the schema
>> tweaks had
>> confused things.  I looked in the slurm accounting table and found the
>> job_table.  I found 80,000 jobs with no end_time, that weren't actually
>> running.
>>  So I set the end_time = begin time for those 80,000 jobs.  It didn't
>> help the
>> reports.
>>
>> I then tried deleting all 80,000 jobs from the job_table and that didn't
>> help
>> either.
>>
>> Is there a way to rebuild the accounting data from the information in the
>> job_
>> table?
>>
>> Or any other suggestion for getting some sane numbers out?
>>
>
>


[slurm-dev] Re: Running jobs are stopped and reqeued when adding new nodes

2017-10-22 Thread Douglas Jacobsen
You cannot change the nodelist without draining the system of running jobs
(terminating all slurmstepd) and restarting all slurmd and slurmctld.  This
is because slurm uses a bit mask to represent the nodelist, and slurm uses
a hierarchical overlay communication network. If all daemons don't have the
same idea of that network you can run into communication problems which can
cause nodes to be marked down, killing the jobs running upon them.

I think if you are not using message aggregation, you might be able to get
away with leaving jobs running and just restarting all slurmd and
slurmctld.  But the tricky thing is you'll need to quiesce a lot of the
rpcs on the system which can partially be done by marking partitions down,
but not completely.

If you are thinking of adding nodes, I think you should look at the future
state that nodes can take. I haven't played with this, but I suspect it
might buy you some flexibility.

On Oct 22, 2017 11:43, "JinSung Kang"  wrote:

> Hello,
>
> I am having trouble with adding new nodes into slurm cluster without
> killing the jobs that are currently running.
>
> Right now I
>
> 1. Update the slurm.conf and add a new node to it
> 2. Copy new slurm.conf to all the nodes,
> 3. Restart the slurmd on all nodes
> 4. Restart the slurmctld
>
> But when I restart slurmctld all the jobs that were currently running are
> requeued (Begin Time) as reason for not running. The new added node works
> perfectly fine.
>
> I've included the slurm.conf. I've also included slurmctld.log output when
> I'm trying to add the new node.
>
> Cheers,
>
> Jin
>


[slurm-dev] Re: slurmctld causes slurmdbd to seg fault

2017-10-17 Thread Douglas Jacobsen
You probably have a core file in the directory where slurmdbd logs to, a
back trace from gdb would be most telling

On Oct 17, 2017 08:17, "Loris Bennett"  wrote:

>
> Hi,
>
> We have been having some with NFS mounts via Infiniband getting dropped
> by nodes.  We ended up switching our main admin server, which provides
> NFS and Slurm from one machine to another.
>
> Now, however, if slurmdbd is started, as soon as slurmctld starts,
> slurmdbd seg faults.  In the slurmdbd.log we have
>
>   slurmdbd: error: We have more allocated time than is possible (7724741 >
> 7012800) for cluster soroban(1948) from 2017-10-17T16:00:00 -
> 2017-10-17T17:00:00 tres 1
>   slurmdbd: error: We have more time than is possible
> (7012800+36720+0)(7049520) > 7012800 for cluster soroban(1948) from
> 2017-10-17T16:00:00 - 2017-10-17T17:00:00 tres 1
>   slurmdbd: Warning: Note very large processing time from hourly_rollup
> for soroban: usec=46390426 began=17:08:17.777
>   Segmentation fault (core dumped)
>
> and the corresponding output of strace is
>
>   fstat(3, {st_mode=S_IFREG|0600, st_size=871270, ...}) = 0
>   write(3, "[2017-10-17T17:09:04.168] Warnin"..., 132) = 132
>   +++ killed by SIGSEGV (core dumped) +++
>
> We're running 17.02.7.  Any ideas?
>
> Cheers,
>
> Loris
>
> --
> Dr. Loris Bennett (Mr.)
> ZEDAT, Freie Universität Berlin Email loris.benn...@fu-berlin.de
>


[slurm-dev] Re: Finding job command after fails

2017-10-15 Thread Douglas Jacobsen
We use a job completion plugin to store that data.  Ours is custom, but it
is loosely based on the elastic completion plugin, which may be a good
starting point.

On Oct 15, 2017 12:48, "Ryan Richholt"  wrote:

> Is there any way to get the job command with sacct?
>
> For example, if I submit a job like this:
>
> $ sbatch testArgs.sh hey there
>
> I can get the full command from "scontrol show job":
>
>   ...
>   Command=/home/rrichholt/scripts/testArgs.sh hey there
>   ...
>
> But, that information is not available long-term with sacct.
>
> To explain why I would like this:
>
> I'm dealing with a workflow that submits lots of jobs for different
> projects. Each submits the same script, but the first argument points to a
> different project directory. When jobs fail, it's very hard to tell which
> project they were working on, because "scontrol show job" only lasts for
> 300 seconds. Sometimes they fail at night and I don't know until the next
> morning.
>


[slurm-dev] Re: slurmstepd error

2017-09-15 Thread Douglas Jacobsen
What is the working directory for slurmd?  I suspect slurmstepd would fork
there, perhaps there is some issue with it?


Doug Jacobsen, Ph.D.
NERSC Computer Systems Engineer
National Energy Research Scientific Computing Center 
dmjacob...@lbl.gov

- __o
-- _ '\<,_
--(_)/  (_)__


On Fri, Sep 15, 2017 at 9:04 AM, Gyro Funch  wrote:

>
> Thank you for the responses, Doug and Andy.
>
> There is a /tmp directory on all of the compute nodes and the
> permissions seem okay (drwxrwxrwt).
>
> Running as root, I find:
> $ cd /tmp
> $ srun -N5 whoami
> root
> slurmstepd: error: Unable to get current working directory: No such
> file or directory
> root
> slurmstepd: error: Unable to get current working directory: No such
> file or directory
> root
> slurmstepd: error: Unable to get current working directory: No such
> file or directory
> root
> slurmstepd: error: Unable to get current working directory: No such
> file or directory
> root
>
>
> I also tried
> $ cd /home/slurm
> $ sudo -u slurm srun -N5 whoami
> slurm
> slurmstepd: error: Unable to get current working directory: No such
> file or directory
> slurm
> slurmstepd: error: Unable to get current working directory: No such
> file or directory
> slurm
> slurmstepd: error: Unable to get current working directory: No such
> file or directory
> slurm
> slurmstepd: error: Unable to get current working directory: No such
> file or directory
> slurm
>
>
> Thanks.
>
> -gyro
>
> On 9/15/2017 9:19 AM, Andy Riebs wrote:
> > Actually, it looks like /tmp is missing on the compute nodes?
> >
> > On 09/15/2017 11:12 AM, Doug Meyer wrote:
> >> Re: [slurm-dev] slurmstepd error
> >> Your path is either erroneous in your submission or inaccessible
> >> by the client.
> >> Welcome to slurm!
> >>
> >> Doug
> >>
> >>
> >> On Sep 15, 2017 8:10 AM, "Gyro Funch"  >> > wrote:
> >>
> >>
> >> Hi,
> >>
> >> I am a new user of slurm (17.02.7) and just installed and
> >> configured
> >> it on a small compute cluster.
> >>
> >> In testing, I ran into the following situation:
> >>
> >> -
> >>
> >> $ cd /tmp
> >> $ srun -N5 hostname
> >> compute-4
> >> slurmstepd: error: Unable to get current working directory: No
> >> such
> >> file or directory
> >> compute-2
> >> slurmstepd: error: Unable to get current working directory: No
> >> such
> >> file or directory
> >> compute-1
> >> slurmstepd: error: Unable to get current working directory: No
> >> such
> >> file or directory
> >> compute-0
> >> slurmstepd: error: Unable to get current working directory: No
> >> such
> >> file or directory
> >> compute-3
> >>
> >> -
> >>
> >> Does anyone know why the 'Unable to get current working directory'
> >> error arises and how to fix it.
> >>
> >> Thank you for your help.
> >>
> >> Kind regards,
> >> gyro
> >>
> >>
> >
>


[slurm-dev] Re: On the need for slurm uid/gid consistency

2017-09-13 Thread Douglas Jacobsen
I would suggest it is a more general requirement, not simply enforced by
use of munge, which does imply a unified uid trust level across all nodes
using the same preshared key, but also when jobs are started, they are
started with a particular uid and other credentials (transmitted in the
slurm RPCs) for the intended user.  If there are different uid/gid values
across the system, this could prove to be problematic in many different
ways.

Given that, I would almost suggest that from a package maintainer
perspective, you should avoid creating the slurm user, and leave it for the
site to solve in whatever makes most sense for them.
-Doug

On Sep 13, 2017 17:12, "Christopher Samuel"  wrote:

>
> On 13/09/17 04:53, Phil K wrote:
>
> > I'm hoping someone can provide an explanation as to why slurm
> > requires uid/gid consistency across nodes, with emphasis on the need
> > for the 'SlurmUser' to be uid/gid-consistent.
>
> I think this is a consequence of the use of Munge, rather than being
> inherent in Slurm itself.
>
> https://dun.github.io/munge/
>
> # It allows a process to authenticate the UID and GID of another
> # local or remote process within a group of hosts having common
> # users and groups
>
> Gory details are in the munged(8) manual page:
>
> https://github.com/dun/munge/wiki/Man-8-munged
>
> But I think the core of the matter is:
>
> # When a credential is validated, munged first checks the
> # message authentication code to ensure the credential has
> # not been subsequently altered. Next, it checks the embedded
> # UID/GID restrictions to determine whether the requesting
> # client is allowed to decode it.
>
> So if the UID's & GID's of the user differ across systems then it
> appears it will not allow the receiver to validate the message.
>
> cheers,
> Chris
> --
>  Christopher SamuelSenior Systems Administrator
>  Melbourne Bioinformatics - The University of Melbourne
>  Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
>


[slurm-dev] Re: How are paired dependencies handled?

2017-08-11 Thread Douglas Jacobsen
I think you want the *kill_invalid_depend* schedulerParameter to have
slurmctld automatically clean up jobs that can never run owing to
unsatisfiable dependencies.

On Aug 11, 2017 3:58 PM, "Alex Reynolds"  wrote:

> Say I submit a job via `sbatch`. Slurm gives it a job ID of `12345`
>
> I then submit two more jobs. The first job runs with the option
> `--dependency:afterok:12345`. The second job runs with the option
> `--dependency:afternotok:12345`.
>
> Those two jobs wait for the first to finish.
>
> The parent job `12345` finishes successfully.
>
> Does the monitor job with the option `--dependency:afternotok:12345` hang
> around in the cluster queue? Or does it get cleared out?
>
> Accordingly, say job `12345` finishes with a non-zero error.
>
> Does the monitor job with the option `--dependency:afterok:12345` stay in
> the queue, or get removed?
>
> Thanks!
>
> -Alex
>


[slurm-dev] Re: Proposed new dependency "--during"?

2017-07-20 Thread Douglas Jacobsen
This sounds a bit like the jobpacks stuff that is in development right now.
  It's more focused on heterogeneous computing but really, at the core,
it's done as multiple jobs that run simultaneously and then merged (I
think).

But more in general, you could imagine a "service" queue one one set of
resources that might allow users to run long running things, like for
example mongo or other database-type instance.  You would only want
dependent jobs to run _during_ the much longer running database or
controller job.  I think the more general capability is a good idea.


Doug Jacobsen, Ph.D.
NERSC Computer Systems Engineer
National Energy Research Scientific Computing Center 
dmjacob...@lbl.gov

- __o
-- _ '\<,_
--(_)/  (_)__


On Thu, Jul 20, 2017 at 4:33 PM, Lev Lafayette  wrote:

> On Fri, 2017-07-21 at 11:21 +1200, Gene Soudlenkov wrote:
> > I don't think it's a good feature. Apart from many implementation
> > difficulties, there are logical ones - what happens if the job fails or
> > ends during the activation time? What happens if "during" job cannot be
> > activated immediately due to resource contention?
>
> The test is only a dependency for initiation. If the dependent job fails
> during activation time, it will still launch and probably do nothing
> interesting. If it can't launch due to resource contention then it
> doesn't launch. It's not going to be killing other jobs :)
>
>
> --
> Lev Lafayette, BA (Hons), GradCertTerAdEd (Murdoch), GradCertPM, MBA
> (Tech Mngmnt) (Chifley)
> HPC Support and Training Officer +61383444193 +61432255208
> Department of Infrastructure Services, University of Melbourne
>
>


[slurm-dev] Re: ssh tunneling

2017-06-20 Thread Douglas Jacobsen
salloc  srun --pty -n1 -N1 --mem-per-cpu=0 --cpu_bind=none
--mpi=none $SHELL

will probably do what you want, i.e., get an allocation and start a shell
on the remote node.


Doug Jacobsen, Ph.D.
NERSC Computer Systems Engineer
National Energy Research Scientific Computing Center 
dmjacob...@lbl.gov

- __o
-- _ '\<,_
--(_)/  (_)__


On Tue, Jun 20, 2017 at 12:16 PM, Paul Hargrove  wrote:

> The root problem is that "salloc" creates a subshell.
> Therefore your "srsh" command in "rsalloc && srsh" is not going to run
> until after the "rsalloc" subshell exits (at which point the allocation is
> released).
>
> -Paul
>
> On Tue, Jun 20, 2017 at 10:55 AM, 나보균  wrote:
>
>> Hi all,
>>
>> I do not understand some commands related slurm.
>>
>> in a bash sell script, I put
>>
>>
>>
>> #!/bin/bash
>>
>> salloc -p remote_host_name -N 1 --gres=gpu:1
>>
>> ssh -X `echo $SLURM_NODELIST`
>>
>>
>>
>> But this was not work. "salloc" did not work well.
>>
>>
>>
>> so I put them in my .bashrc file such as
>>
>>
>>
>> alias rsalloc="salloc -p romeo -N 1 --gres=gpu:1"
>>
>> alias srsh="ssh -X `echo $SLURM_NODELIST`"
>>
>> alias rshow="rsalloc && srsh"
>>
>>
>>
>>
>>
>> Then, as a aliased command "rsalloc" and "srsh" works fine but "rshow"
>> did not work.
>>
>>
>>
>> What is wrong?
>>
>>
>>
>> Bokyoon
>>
>>
>>
>>
>>
>> [image: Dr 나보균]
>>
>>
>>
>
>
>
> --
> Paul H. Hargrove  phhargr...@lbl.gov
> Computer Languages & Systems Software (CLaSS) Group
> Computer Science Department   Tel: +1-510-495-2352
> <(510)%20495-2352>
> Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
> <(510)%20486-6900>
>


[slurm-dev] Re: Scheduling weirdness

2017-06-16 Thread Douglas Jacobsen
I typically recommend that bf_window be roughly 2x the max wall time,  this
allows for planning beyond the edge of the window.  You may need to
increase bf_resolution (it should be fine for almost all cases to go up to
300s), and potentially increase bf_interval to ensure there is enough time
for the backfill scheduler to get through your workload.

Note that things like completing nodes can cause jobs to continuously move
back in time if they are considered for planning.   An unkillable job step
script that can down nodes after some longish period of time (15 minutes
perhaps)? if need be (and nothing else is running) can help with some level
of automation here.

-Doug


Doug Jacobsen, Ph.D.
NERSC Computer Systems Engineer
National Energy Research Scientific Computing Center 
dmjacob...@lbl.gov

- __o
-- _ '\<,_
--(_)/  (_)__


On Fri, Jun 16, 2017 at 5:21 AM, Robbert Eggermont 
wrote:

>
> On 16-06-17 11:42, TO_Webmaster wrote:
>
>> What are the values of bf_window and bf_resolution in your configuration?
>>
>> > From the documentation of bf_window: "The default value is 1440
> > minutes (one day). A value at least as long as the highest allowed
> > time limit is generally advisable to prevent job starvation."
>
> SchedulerType=sched/backfill
> SchedulerParameters=bf_window=7-0,defer
> SelectType=select/cons_res
> SelectTypeParameters=CR_CORE_MEMORY
>
> bf_resolution should be the default (60s).
>
> Our maximum time-limit is 7-0. Any ideas on what would be the optimal
> bf_window value for this?
>
> In relation to the problem: most running jobs had a time limit of 6-21,
> and the highest priority job was scheduled to start within the 7-0
> bf_window. Does that rule out the bf_window as a factor in the problem, or
> not?
>
> In the schedule iteration at 2017-06-15T23:22:10, the highest priority job
> was scheduled to start at 2017-06-22T18:29:17, 6-19:07:07 ahead, so I don't
> understand why in the same iteration the lower priority job with a longer
> 6-21 time limit was immediately started on the same node?
>
> Using defer was an attempt to get more optimal scheduling, but
> unfortunately it didn't change anything for this problem.
>
> Robbert
>
> 2017-06-16 1:16 GMT+02:00 Robbert Eggermont :
>>
>>>
>>> Hello,
>>>
>>> In our Slurm setup (now 17.02.4) I've noticed several times now that
>>> backfilled jobs push back the start time of the highest priority job.
>>> I'm not sure if this is due to a configuration error or an scheduler
>>> error,
>>> and since I'm having a hard time diagnosing what's happening, I was
>>> hoping
>>> for some insightful tips.
>>>
>>> What happens is that when the highest priority pending job needs a lot of
>>> resources (CPUs, ...), slurm will backfill lower priority jobs with less
>>> requirements but with a higher timelimit than the currently running jobs.
>>>
>>> For example, the highest priority job needs a full node, and the first
>>> node
>>> will become available in 6 days; our slurm will happily backfill all
>>> pending
>>> lower priority 2-CPU 7-day jobs on every possible node in the cluster,
>>> thus
>>> pushing back the highest priority job 1 day.
>>>
>>> Looking into the scheduler debugging info, I noticed some things I can't
>>> explain:
>>> 1) the highest priority job ("A") is not always scheduled to start on the
>>> first node ("1") that will become available;
>>> 2) in the same iteration, the backfill logic will start another, lower
>>> priority, smaller job with a timelimit longer than the expected start
>>> time
>>> of job "A" on the same node "1";
>>> 3) when "A" is scheduled to start on another node, the scheduled starting
>>> time remains the same (i.e. it is not updated to the time that the new
>>> node
>>> becomes available).
>>> 4) the scheduled starting time of the highest priority job ("A") is
>>> sometimes later than the time that the node becomes available;
>>>
>>> See below for some log entries for these events.
>>>
>>> Does anybody have an idea what's going on here, and how we can fix it?
>>>
>>> Robbert
>>>
>>>
>>> 1)
>>> JobID=1315558 has a scheduled start time on node maxwell of
>>> 2017-06-22T19:11:16; forcing it to another node (by draining maxwell)
>>> reduces the start time to 2017-06-22T16:47:43.2017-06-15T23:22:10
>>> (But slurm is consistent: when maxwell is resumed, the job is scheduled
>>> there again, with the later start time.)
>>>

 [2017-06-15T22:11:26.688] backfill: beginning
 [2017-06-15T22:11:26.693] backfill test for JobID=1315558 Prio=22703485
 Partition=general
 [2017-06-15T22:11:26.694] Job 1315558 to start at 2017-06-22T19:11:16,
 end
 at 2017-06-27T07:11:00 on maxwell
 [2017-06-15T22:11:26.694] backfill: reached end of job queue
 [2017-06-15T22:11:52.223] update_node: node maxwell state set to
 DRAINING
 [2017-06-15T22:11:56.695] backfill: beginning
 

[slurm-dev] Re: Accounting: preventing scheduling after TRES limit reached (permanently)

2017-06-05 Thread Douglas Jacobsen
I believe you can use fairshare without decaying usage, the fairshares will
only decline over time is all.  This may mean that a user that consumes a
large portion of their share early may have trouble getting priority later.


On Jun 5, 2017 7:10 AM, "Jacob Chappell" <jacob.chapp...@uky.edu> wrote:

Hi Douglas,

It'd be nice to have the ability to incorporate recent usage into the
priority, but it seems like I can't do both that *and* have hard limits
right? I think hard limits are most important between the two. I should
just be able to set the FairshareWeight to 0 to ignore that component in
the priority, but still enforce the limits with the GrpMins parameters
right?

Thanks,
Jacob Chappell

On Mon, Jun 5, 2017 at 9:05 AM, Douglas Jacobsen <dmjacob...@lbl.gov> wrote:

> Sorry, I meant GrpTRESMins, but that usage is decayed, as Chris mentioned,
> based on the decay rate half life.  In your scenario however, it seems like
> not decaying usage would make sense.
>
> Are you wanting to consider recent usage when making priority decisions?
>
> On Jun 5, 2017 5:53 AM, "Douglas Jacobsen" <dmjacob...@lbl.gov> wrote:
>
>> I think you could still set GrpTRESRunMins on an account or association
>> to set hard quotas.
>>
>> On Jun 5, 2017 5:21 AM, "Jacob Chappell" <jacob.chapp...@uky.edu> wrote:
>>
>>> Hi Chris,
>>>
>>> Thank you very much for the details and clarification. It's unfortunate
>>> that you can't have both fairshare and fixed quotas. I'll pass this
>>> information along to my supervisors.
>>>
>>> Jacob Chappell
>>>
>>> On Sun, Jun 4, 2017 at 7:55 PM, Christopher Samuel <
>>> sam...@unimelb.edu.au> wrote:
>>>
>>>>
>>>> On 03/06/17 07:03, Jacob Chappell wrote:
>>>>
>>>> > Sorry, that was a mouthful, but important. Does anyone know if Slurm
>>>> can
>>>> > accomplish this for me. If so how?
>>>>
>>>> This was how we used to run prior to switching to fair-share.
>>>>
>>>> Basically you set:
>>>>
>>>> PriorityDecayHalfLife=0
>>>>
>>>> which stops the values decaying over time so once they hit their limit
>>>> that's it.
>>>>
>>>> We also set:
>>>>
>>>> PriorityUsageResetPeriod=QUARTERLY
>>>>
>>>> so that limits would reset on the quarter boundaries.  This was because
>>>> we used to have fixed quarterly allocations for projects.
>>>>
>>>> We went to fair-share because of a change of the funding model for us
>>>> meant previous rules were removed and so we could go to fair-share which
>>>> meant a massive improvement in utilisation (compute nodes were no longer
>>>> idle with jobs waiting but unable to run because of being out of quota).
>>>>
>>>> NOTE: You can't have both fairshare and hard quotas at the same time.
>>>>
>>>> All the best,
>>>> Chris
>>>> --
>>>>  Christopher SamuelSenior Systems Administrator
>>>>  Melbourne Bioinformatics - The University of Melbourne
>>>>  Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
>>>>
>>>
>>>


[slurm-dev] Re: Accounting: preventing scheduling after TRES limit reached (permanently)

2017-06-05 Thread Douglas Jacobsen
Sorry, I meant GrpTRESMins, but that usage is decayed, as Chris mentioned,
based on the decay rate half life.  In your scenario however, it seems like
not decaying usage would make sense.

Are you wanting to consider recent usage when making priority decisions?

On Jun 5, 2017 5:53 AM, "Douglas Jacobsen" <dmjacob...@lbl.gov> wrote:

> I think you could still set GrpTRESRunMins on an account or association
> to set hard quotas.
>
> On Jun 5, 2017 5:21 AM, "Jacob Chappell" <jacob.chapp...@uky.edu> wrote:
>
>> Hi Chris,
>>
>> Thank you very much for the details and clarification. It's unfortunate
>> that you can't have both fairshare and fixed quotas. I'll pass this
>> information along to my supervisors.
>>
>> Jacob Chappell
>>
>> On Sun, Jun 4, 2017 at 7:55 PM, Christopher Samuel <sam...@unimelb.edu.au
>> > wrote:
>>
>>>
>>> On 03/06/17 07:03, Jacob Chappell wrote:
>>>
>>> > Sorry, that was a mouthful, but important. Does anyone know if Slurm
>>> can
>>> > accomplish this for me. If so how?
>>>
>>> This was how we used to run prior to switching to fair-share.
>>>
>>> Basically you set:
>>>
>>> PriorityDecayHalfLife=0
>>>
>>> which stops the values decaying over time so once they hit their limit
>>> that's it.
>>>
>>> We also set:
>>>
>>> PriorityUsageResetPeriod=QUARTERLY
>>>
>>> so that limits would reset on the quarter boundaries.  This was because
>>> we used to have fixed quarterly allocations for projects.
>>>
>>> We went to fair-share because of a change of the funding model for us
>>> meant previous rules were removed and so we could go to fair-share which
>>> meant a massive improvement in utilisation (compute nodes were no longer
>>> idle with jobs waiting but unable to run because of being out of quota).
>>>
>>> NOTE: You can't have both fairshare and hard quotas at the same time.
>>>
>>> All the best,
>>> Chris
>>> --
>>>  Christopher SamuelSenior Systems Administrator
>>>  Melbourne Bioinformatics - The University of Melbourne
>>>  Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
>>>
>>
>>


[slurm-dev] Re: Accounting: preventing scheduling after TRES limit reached (permanently)

2017-06-05 Thread Douglas Jacobsen
I think you could still set GrpTRESRunMins on an account or association to
set hard quotas.

On Jun 5, 2017 5:21 AM, "Jacob Chappell"  wrote:

> Hi Chris,
>
> Thank you very much for the details and clarification. It's unfortunate
> that you can't have both fairshare and fixed quotas. I'll pass this
> information along to my supervisors.
>
> Jacob Chappell
>
> On Sun, Jun 4, 2017 at 7:55 PM, Christopher Samuel 
> wrote:
>
>>
>> On 03/06/17 07:03, Jacob Chappell wrote:
>>
>> > Sorry, that was a mouthful, but important. Does anyone know if Slurm can
>> > accomplish this for me. If so how?
>>
>> This was how we used to run prior to switching to fair-share.
>>
>> Basically you set:
>>
>> PriorityDecayHalfLife=0
>>
>> which stops the values decaying over time so once they hit their limit
>> that's it.
>>
>> We also set:
>>
>> PriorityUsageResetPeriod=QUARTERLY
>>
>> so that limits would reset on the quarter boundaries.  This was because
>> we used to have fixed quarterly allocations for projects.
>>
>> We went to fair-share because of a change of the funding model for us
>> meant previous rules were removed and so we could go to fair-share which
>> meant a massive improvement in utilisation (compute nodes were no longer
>> idle with jobs waiting but unable to run because of being out of quota).
>>
>> NOTE: You can't have both fairshare and hard quotas at the same time.
>>
>> All the best,
>> Chris
>> --
>>  Christopher SamuelSenior Systems Administrator
>>  Melbourne Bioinformatics - The University of Melbourne
>>  Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
>>
>
>


[slurm-dev] Re: How to cleanup mysql db old records?

2017-05-25 Thread Douglas Jacobsen
Regarding the "more allocated time than is possible" messages, I'd suggest
checking for runaway jobs:

sacctmgr show runawayjobs

You might want to look at the records a bit before agreeing to let it fix
them automatically.  If that doesn't find anything, there might be some
nodes incorrectly down in the events tables (if I remember correctly).


Doug Jacobsen, Ph.D.
NERSC Computer Systems Engineer
National Energy Research Scientific Computing Center 
dmjacob...@lbl.gov

- __o
-- _ '\<,_
--(_)/  (_)__


On Thu, May 25, 2017 at 8:21 AM, Balaji Deivam 
wrote:

> Hi,
>
> I am trying to cleanup the old records in the mysql DB which is present
> from Oct 2015.
>
> I used below settings in the slurmdbd.conf file, but didn't get cleaned up
> any records last night. May be due to huge no:of records its not able to
> purge?
>
> How can we handle this situation? We want to keep only the records for one
> month. Whether we can able to delete it manually?
>
>
> *DB Size: *
>
> -rwxr-xr-x 1 mysql mysql *25813843968 *May 25 10:10 ibdata1
>
>
>
>
> *slurmdbd.conf:*
>
> PurgeEventAfter=550days
> PurgeJobAfter=550days
> PurgeResvAfter=550days
> PurgeStepAfter=550days
> PurgeSuspendAfter=550days
>
>
>
> *Slurmdbd.log:*
>
> [2017-05-24T18:00:11.429] error: We have more allocated time than is
> possible (857310 > 417600) for cluster cluster(116) from
> 2017-05-24T17:00:00 - 2017-05-24T18:00:00 tres 1
> [2017-05-24T18:00:11.446] Warning: Note very large processing time from
> hourly_rollup for cluster: usec=11415496 began=18:00:00.030
> [2017-05-24T19:00:11.355] error: We have more allocated time than is
> possible (838571 > 417600) for cluster cluster(116) from
> 2017-05-24T18:00:00 - 2017-05-24T19:00:00 tres 1
> [2017-05-24T19:00:11.369] Warning: Note very large processing time from
> hourly_rollup for cluster: usec=10920415 began=19:00:00.448
> [2017-05-24T20:00:11.218] error: We have more allocated time than is
> possible (860964 > 417600) for cluster cluster(116) from
> 2017-05-24T19:00:00 - 2017-05-24T20:00:00 tres 1
> [2017-05-24T20:00:11.239] Warning: Note very large processing time from
> hourly_rollup for cluster: usec=10867720 began=20:00:00.372
> [2017-05-24T21:00:11.192] error: We have more allocated time than is
> possible (803602 > 417600) for cluster cluster(116) from
> 2017-05-24T20:00:00 - 2017-05-24T21:00:00 tres 1
> [2017-05-24T21:00:11.207] Warning: Note very large processing time from
> hourly_rollup for cluster: usec=10964810 began=21:00:00.243
> [2017-05-24T22:00:11.394] error: We have more allocated time than is
> possible (799539 > 417600) for cluster cluster(116) from
> 2017-05-24T21:00:00 - 2017-05-24T22:00:00 tres 1
> [2017-05-24T22:00:11.408] Warning: Note very large processing time from
> hourly_rollup for cluster: usec=11197730 began=22:00:00.211
> [2017-05-24T23:00:11.715] error: We have more allocated time than is
> possible (787641 > 417600) for cluster cluster(116) from
> 2017-05-24T22:00:00 - 2017-05-24T23:00:00 tres 1
> [2017-05-24T23:00:11.727] Warning: Note very large processing time from
> hourly_rollup for cluster: usec=11315195 began=23:00:00.412
> [2017-05-25T00:00:11.617] error: We have more allocated time than is
> possible (794632 > 417600) for cluster cluster(116) from
> 2017-05-24T23:00:00 - 2017-05-25T00:00:00 tres 1
> [2017-05-25T00:00:11.632] Warning: Note very large processing time from
> hourly_rollup for cluster: usec=10902982 began=00:00:00.729
> [2017-05-25T00:01:14.089] Warning: Note very large processing time from
> daily_rollup for cluster: usec=62456333 began=00:00:11.632
> [2017-05-25T01:00:11.357] error: We have more allocated time than is
> possible (772316 > 417600) for cluster cluster(116) from
> 2017-05-25T00:00:00 - 2017-05-25T01:00:00 tres 1
> [2017-05-25T01:00:11.374] Warning: Note very large processing time from
> hourly_rollup for cluster: usec=11281362 began=01:00:00.092
>
>
> Thanks & Regards,
> Balaji
>


[slurm-dev] Re: Compute nodes drained or draining

2017-05-17 Thread Douglas Jacobsen
Batch job completion failure typically indicates an issue on the slurmd or
slurmstepd side of things that slurmctld is unsure how to deal with.  Try
checking your slurmd logs (debug level) on the impacted nodes.  Given the
asterisk in the sinfo output, I'm also guessing that slurmd exited.  There
may be a core file in the same directory as the log, which might be of use
understanding the issue.

Doug

On May 17, 2017 5:52 AM, "Andy Riebs"  wrote:

>
> You can use "sinfo -R" to find out why they are being drained -- or at
> least a hint, depending on the situation.
>
> Andy
>
> On 05/17/2017 08:44 AM, Lennart Karlsson wrote:
>
>>
>> On 05/17/2017 10:08 AM, Baker D.J. wrote:
>>
>>> Hello,
>>>
>>> Quite recently I upgraded to Slurm 17.02.2. For some reason every day
>>> I've noticed that a number of compute nodes go into draining or drained
>>> state after running jobs. For example, this morning I see...
>>>
>>> [root@blue30 openmpi-1.10.1]# sinfo -Nl | grep drain
>>> red01651batch*draining   122:6:1 10  1
>>>  (null) batch job complete f
>>> red06911batch* drained   122:6:1 10  1
>>>  (null) batch job complete f
>>> red06921batch*draining   122:6:1 10  1
>>>  (null) batch job complete f
>>> red08821batch*draining   122:6:1 10  1
>>>  (null) batch job complete f
>>> red08951batch* drained   122:6:1 10  1
>>>  (null) batch job complete f
>>> red08971batch* drained   122:6:1 10  1
>>>  (null) batch job complete f
>>> red09011batch*draining   122:6:1 10  1
>>>  (null) batch job complete f
>>>
>>> I never saw this behaviour with v16.05.2. Does anyone understand why
>>> this is happening, please? Is this a bug introduced in 17.02.2?
>>>
>>> Best regards,
>>> David
>>>
>>
>> Hello David,
>>
>> We also saw this when we had upgraded to v 17.02, and have the same
>> questions.
>>
>> Cheers,
>> -- Lennart Karlsson, UPPMAX, Uppsala University, Sweden
>>
>
> --
> Andy Riebs
> andy.ri...@hpe.com
> Hewlett-Packard Enterprise
> High Performance Computing Software Engineering
> +1 404 648 9024
> My opinions are not necessarily those of HPE
> May the source be with you!
>


[slurm-dev] Re: Adjusting MaxJobCount and SlurmctldPort settings

2017-05-16 Thread Douglas Jacobsen
Hello,

Changing slurmctld port should probably wait until all jobs have stopped
running.  Running jobs won't fail in this case, but there is a good chance
they will fail to complete properly, and the compute node operating them
might get stuck in the completing state (since the slurmstepd operating the
job would fail to communicate properly, owing to the change in port).

Changing a parameter like MaxJobCount should be safe in my opinion,
possibly even with just a change to slurm.conf followed by `scontrol
reconfigure`; though if that doesn't work you might need to restart
slurmctld.

-Doug




Doug Jacobsen, Ph.D.
NERSC Computer Systems Engineer
National Energy Research Scientific Computing Center 
dmjacob...@lbl.gov

- __o
-- _ '\<,_
--(_)/  (_)__


On Tue, May 16, 2017 at 2:22 AM, Mark S. Holliman  wrote:

>
> Hi everyone,
>
> Does anyone know if changing the slurmctld settings for MaxJobCount and
> SlurmctldPort will cause jobs already running/waiting to fail?  My users
> have hit the default 10,000 queue limit, and I'd like to increase that, but
> not if it's going to kill everything that's running.  I know most settings
> can be changed (and slurmctld/slurmd restarted) without issue.  But given
> that scheduler changes can cause existing jobs to get killed I'm uncertain
> about these parameters...
>
> Cheers,
>   Mark
>
> ---
> Mark Holliman
> Wide Field Astronomy Unit
> Institute for Astronomy
> University of Edinburgh
> ---
>
>
>


[slurm-dev] Re: User accounting

2017-05-04 Thread Douglas Jacobsen
I typod the example command.  --gres should have been --tres


Doug Jacobsen, Ph.D.
NERSC Computer Systems Engineer
National Energy Research Scientific Computing Center <http://www.nersc.gov>
dmjacob...@lbl.gov

- __o
-- _ '\<,_
--(_)/  (_)__


On Thu, May 4, 2017 at 7:02 AM, Douglas Jacobsen <dmjacob...@lbl.gov> wrote:

> You can also use sreport to get summaries (though it is limited)
>
> sreport user top users= --gres=cpu,mem
>
> Can include other limits like cluster, start, end, can group by account
> and so on.  Limitation is that the TopUsers report only ever shows the
> top10 users.  Would be nice to get top N users.
>
> 
> Doug Jacobsen, Ph.D.
> NERSC Computer Systems Engineer
> National Energy Research Scientific Computing Center
> <http://www.nersc.gov>
> dmjacob...@lbl.gov
>
> - __o
> -- _ '\<,_
> --(_)/  (_)__
>
>
> On Thu, May 4, 2017 at 6:58 AM, Swindelles, Ed <ed.swindel...@uconn.edu>
> wrote:
>
>> Hi Mahmood -
>>
>> I don’t believe SLURM has a direct way to generate that report. You can
>> collect all of the data necessary to create that report with the “sacct”
>> command, though. Then, use your favorite data analysis tools (Bash, Excel,
>> R, etc.) to aggregate rows and format it appropriately. Here’s an example
>> sacct command to dump all jobs for the last seven days, including columns
>> for some of the metrics you asked for:
>>
>> $ sacct -aXS $(date -d "-7 days" +%F) -oUser,JobID,State,Start,Elaps
>> ed,AllocCPUS,ReqMem,MaxDiskWrite
>>
>> (Note that this really is ALL jobs, so consider filtering by user or
>> account if you’ve got thousands/millions/etc. The man page for sacct is
>> very friendly.)
>>
>> I’ll also put in a plug for XDMoD. It is a powerful web app for getting
>> useful aggregate data from SLURM (and others). We use it quite a bit to
>> generate usage reports, mostly for administration. http://open.xdmod.org
>>
>> Best of luck,
>>
>> --
>> Ed Swindelles
>> Manager of Advanced Computing
>> University of Connecticut
>>
>> On May 4, 2017, at 9:08 AM, Mahmood Naderan <mahmood...@gmail.com> wrote:
>>
>> Hi,
>> I read the accounting page https://slurm.schedmd.com/accounting.html
>> however since it is quite large, I didn't get my answer!
>> I want to know the user stats for their jobs. For example, something like
>> this
>>
>>> time for all jobs including successful and not> > used>   ...
>>
>> Assume a user has submitted 2 jobs with the following specs:
>> job1: 10 minutes, 2.4GB memory, 4 cores, 1GB disk, success
>> job2: 15 minutes, 6GB memory, 2 cores, 2 GB disk, failed (due to his code
>> and not the system error)
>>
>> So the report looks like
>>
>> user, 2, 1, 25 min, 6, 8.4GB, 3GB, ...
>>
>> How can I get that?
>>
>> Regards,
>> Mahmood
>>
>>
>>
>>
>


[slurm-dev] Re: User accounting

2017-05-04 Thread Douglas Jacobsen
You can also use sreport to get summaries (though it is limited)

sreport user top users= --gres=cpu,mem

Can include other limits like cluster, start, end, can group by account and
so on.  Limitation is that the TopUsers report only ever shows the top10
users.  Would be nice to get top N users.


Doug Jacobsen, Ph.D.
NERSC Computer Systems Engineer
National Energy Research Scientific Computing Center 
dmjacob...@lbl.gov

- __o
-- _ '\<,_
--(_)/  (_)__


On Thu, May 4, 2017 at 6:58 AM, Swindelles, Ed 
wrote:

> Hi Mahmood -
>
> I don’t believe SLURM has a direct way to generate that report. You can
> collect all of the data necessary to create that report with the “sacct”
> command, though. Then, use your favorite data analysis tools (Bash, Excel,
> R, etc.) to aggregate rows and format it appropriately. Here’s an example
> sacct command to dump all jobs for the last seven days, including columns
> for some of the metrics you asked for:
>
> $ sacct -aXS $(date -d "-7 days" +%F) -oUser,JobID,State,Start,
> Elapsed,AllocCPUS,ReqMem,MaxDiskWrite
>
> (Note that this really is ALL jobs, so consider filtering by user or
> account if you’ve got thousands/millions/etc. The man page for sacct is
> very friendly.)
>
> I’ll also put in a plug for XDMoD. It is a powerful web app for getting
> useful aggregate data from SLURM (and others). We use it quite a bit to
> generate usage reports, mostly for administration. http://open.xdmod.org
>
> Best of luck,
>
> --
> Ed Swindelles
> Manager of Advanced Computing
> University of Connecticut
>
> On May 4, 2017, at 9:08 AM, Mahmood Naderan  wrote:
>
> Hi,
> I read the accounting page https://slurm.schedmd.com/accounting.html
> however since it is quite large, I didn't get my answer!
> I want to know the user stats for their jobs. For example, something like
> this
>
> time for all jobs including successful and not>  used>   ...
>
> Assume a user has submitted 2 jobs with the following specs:
> job1: 10 minutes, 2.4GB memory, 4 cores, 1GB disk, success
> job2: 15 minutes, 6GB memory, 2 cores, 2 GB disk, failed (due to his code
> and not the system error)
>
> So the report looks like
>
> user, 2, 1, 25 min, 6, 8.4GB, 3GB, ...
>
> How can I get that?
>
> Regards,
> Mahmood
>
>
>
>


[slurm-dev] Re: inject arbitrary env variables in Slurm job

2017-01-26 Thread Douglas Jacobsen
Another way is to use a job_submit plugin, a lua-based one in particular,
then you have a great deal of control and it is performed at job submit
time.

You can modify job_request.env array to manipulate environment variables.


Doug Jacobsen, Ph.D.
NERSC Computer Systems Engineer
National Energy Research Scientific Computing Center 
dmjacob...@lbl.gov

- __o
-- _ '\<,_
--(_)/  (_)__


On Thu, Jan 26, 2017 at 3:08 PM, Peter A Ruprecht <
peter.rupre...@colorado.edu> wrote:

> Ah, I can't believe I overthought this problem and overlooked just using
> the Prolog.  Thanks for the pointer.  Also thanks to the offline responders
> who suggested a spank plugin.
>
>
>
> This list is great!
>
>
>
> Pete
>
>
>
> *From: *Lyn Gerner 
> *Reply-To: *slurm-dev 
> *Date: *Thursday, January 26, 2017 at 4:05 PM
> *To: *slurm-dev 
> *Subject: *[slurm-dev] Re: inject arbitrary env variables in Slurm job
>
>
>
> Hi Pete,
>
>
>
> Follow the link from the Documentation page to the Prolog and Epilog Guide
> for how to inject a customized env variable, as well as the Environment
> Variables section of the sbatch man page, for the Slurm env vars relevant
> to #cores.
>
>
>
> Regards,
>
> Lyn
>
>
>
> On Thu, Jan 26, 2017 at 11:58 AM, Peter A Ruprecht <
> peter.rupre...@colorado.edu> wrote:
>
> Hi everyone,
>
>
>
> I'm trying to figure out a way to set environment variables into the
> environment that a Slurm job runs in, depending on the characteristics of
> the job.
>
>
>
> Here's the background:  our new cluster has Omni-Path interconnect, which
> uses hardware contexts that are associated with each MPI process or rank on
> the node.  We allow node sharing and in some cases when there are multiple
> MPI jobs on the same node (don't ask…) one job apparently uses up too many
> contexts and the other job crashes.
>
>
>
> So I'd like to set the PSM2_SHAREDCONTEXTS_MAX environment variable to an
> appropriate value for each job based on the number of cores or contexts
> available on the node and the number of cores requested by the job.
> Presumably the job_submit script would be the logical place to do this but
> I can't figure out how to set environment variables for the job in it.
>
>
>
> Any suggestions if this is the right track?  Other ideas?
>
>
>
> Thanks,
>
> Pete Ruprecht
>
> CU-Boulder Research Computing
>
>
>


[slurm-dev] Re: srun job launch time issue

2017-01-11 Thread Douglas Jacobsen
Are these sruns already in an allocation or not?  If not, you might 
consider setting PrologFlags=alloc in slurm.conf, which should perform 
much of the remote job setup when the head node is configured (presuming 
that might be your issue, or you have a configuration that might make 
that an issue).  Otherwise, checking slurmd logs (ideally with debug 
level logging) on the different nodes may give a clue.


-Doug


On 1/11/17 11:23 AM, Pritchard Jr., Howard wrote:

Hi SLURM folks,

I recently got SLURM (16.05.6) set up on a small cluster (48 nodes 
x86_64 + Intel OPA)

and things appear to be nominal except for one odd performance problem
as far as srun launch times go.  I don’t observe this on other 
clusters running

SLURM at our site.

What I’m observing is that regardless of whether or not the 
application being
launched is a command (e.g. /bin/hostname) or an MPI application, I 
get reasonable
job launch times when using one node, but as soon as I use two or 
morenodes, there
is about a 10 second overhead to get the processes on the additional 
nodes started:


For example:

[hpp@hi-master ~]$ srun -n 8 -N 1 date

Wed Jan 11 12:11:29 MST 2017

Wed Jan 11 12:11:29 MST 2017

Wed Jan 11 12:11:29 MST 2017

Wed Jan 11 12:11:29 MST 2017

Wed Jan 11 12:11:29 MST 2017

Wed Jan 11 12:11:29 MST 2017

Wed Jan 11 12:11:29 MST 2017

Wed Jan 11 12:11:29 MST 2017


[hpp@hi-master ~]$ srun -n 8 -N 2 date

Wed Jan 11 12:10:35 MST 2017

Wed Jan 11 12:10:35 MST 2017

Wed Jan 11 12:10:35 MST 2017

Wed Jan 11 12:10:35 MST 2017

Wed Jan 11 12:10:44 MST 2017

Wed Jan 11 12:10:44 MST 2017

Wed Jan 11 12:10:44 MST 2017

Wed Jan 11 12:10:44 MST 2017


[hpp@hi-master ~]$ srun -n 8 -N 4 date

Wed Jan 11 12:10:57 MST 2017

Wed Jan 11 12:10:57 MST 2017

Wed Jan 11 12:11:07 MST 2017

Wed Jan 11 12:11:06 MST 2017

Wed Jan 11 12:11:07 MST 2017

Wed Jan 11 12:11:06 MST 2017

Wed Jan 11 12:11:07 MST 2017

Wed Jan 11 12:11:07 MST 2017


Anyone observed this problem before?


Any suggestions on how to resolve this problem would be much

appreciated.


Thanks,


Howard


--
Howard Pritchard
HPC-DES
Los Alamos National Laboratory





[slurm-dev] Re: SLURM reports much higher memory usage than really used

2016-12-15 Thread Douglas Jacobsen


There are other good reasons to use jobacct_gather/cgroup, in particular 
if memory enforcement is used, jobacct_gather/linux will cause a job to 
be terminated if the summed memory exceeds the limit, which is OK so 
long as large memory processes aren't forking and artificially 
increasing the apparent memory usage seen by jobacct_gather/linux 
summing up contributions through the /proc interface.  
jobacct_gather/cgroup on the other hand has much more reliable 
accounting of memory even for workloads where large memory processes 
(e.g., java) are forking processes.



On 12/15/16 2:22 PM, Christopher Samuel wrote:

On 16/12/16 02:15, Stefan Doerr wrote:


If I check on "top" indeed it shows all processes using the same amount
of memory. Hence if I spawn 10 processes and you sum usages it would
look like 10x the memory usage.

Do you have:

JobAcctGatherType=jobacct_gather/linux

or:

JobAcctGatherType=jobacct_gather/cgroup

If the former, try the latter and see if it helps get better numbers (we
went to the former after suggestions from SchedMD but from highly
unreliable memory had to revert due to similar issues to those you are
seeing).

Best of luck,
Chris


[slurm-dev] Re: Two partitions with same compute nodes

2016-11-29 Thread Douglas Jacobsen
One possible solution might be to implement a job_submit plugin (ideally 
using the lua interface).  You could check the gres request field, and 
if it includes a GPU request, then either force the user to the cuda 
partition, or deny the job if it isn't submitted to the cuda partition.


e.g., something like:

function slurm_job_submit(job_request, part_list, submit_uid)
local gres = job_request.gres
if (gres == nil) then gres = "" end
for item in string.gmatch(gres, '([^,]+)') do
local gresType = item
local gresTypeStripped
local gresValue
gresTypeStripped,gresValue = string.match(item, "([^:]+):([^,]+)")
if (gresTypeStripped ~= nil) then
gresType = gresTypeStripped
end
if (gresType == "gpu") then
job_request.partition = "cuda" -- if you want to force the user 
to the cuda partition
end
end
return slurm.SUCCESS
end


Above is untested, and converted from something else, but you should be 
able to see what can be achieved in terms of managing policy with 
job_submit plugins.  A complete solution would also need to properly 
implement something for slurm_job_modify, potentially take different 
behaviors if submit_uid is 0, and so on.


-Doug

On 11/29/16 5:58 AM, Daniel Ruiz Molina wrote:

Yes, I have already configured two partitions. My slurmd.conf contains:

[...]
# RESOURCES
GresTypes=gpu

# COMPUTE NODES
NodeName=mynodes[1-20] CPUs=8 SocketsPerBoard=1 CoresPerSocket=4
ThreadsPerCore=2 RealMemory=7812 TmpDisk=50268
Gres=gpu:GeForceGTX480:1

# PARTITIONS
PartitionName=openmpi Nodes=mynodes[1-20] Default=YES
MaxTime=8:00:00 State=UP MaxCPUsPerNode=8
PartitionName=cuda Nodes=amynodes[11-15] MaxTime=INFINITE State=UP
[...]

And my gres.conf is:

NodeName=mynodes[11-15] Name=gpu Count=1 Type=GeForceGTX640
File=/dev/nvidia0 CPUs=0-7


With that, nodes "mynodes" 11, 12, 13, 14 and 15 belong to both 
partitions... but how SLURM know that I won't use GPU in mynode12 if I 
submit with "--partition openmpi --gres gpu:GeForceGTX480:1"???

Is, really, gpu resource assigned to cuda partition? where?

After doing some tests, I have been able to submit a batch job in both 
partition requesting a gpu resource...


Thanks.

El 29/11/2016 a las 14:36, Schmidtmann, Carl escribió:

On Nov 29, 2016, at 8:23 AM, Ole Holm Nielsen  
wrote:


On 11/29/2016 12:27 PM, Daniel Ruiz Molina wrote:

I would like to know if it would be possible in SLURM configure two
partition, composed by the same nodes, but one for using with GPUs and
the other one only for OpenMPI. This configuration was allowed in Sun
Grid Engine because GPU resource was assigned to the queue and to the
compute node, but in SLURM I have only found the way for assigning a GPU
resource to a compute node, independently if that compute belongs to
partition X or to partition Y.

That is actually how they recommended setting up GPU nodes in the Slurm docs a 
couple of years ago (maybe changed now). Make a ‘gpu’ partition with the nodes 
and another partition that contains the same nodes. The second partition can 
even limit the total number of CPUs per node to save at least one CPU per GPU 
for use in the GPU partition. We have put the GPU nodes into an ‘interactive’ 
partition because the ‘standard’ partition is heterogenous so we did not want 
to limit the CPUs available on any one node arbitrarily.

Carl







[slurm-dev] Re: problem configuring mvapich + slurm: "error: mpi/pmi2: failed to send temp kvs to compute nodes"

2016-11-18 Thread Douglas Jacobsen

Hello,

Is " /home/localsoft/slurm/spool" local to the node?  Or is it on the 
network?  I think each node needs to have separate data (like job_cred) 
stored there, and if each slurmd is competing for that file naming space 
I could imagine that srun could have problems. I typically use 
/var/spool/slurmd.


From the slurm.conf page:

"""

*SlurmdSpoolDir*
   Fully qualified pathname of a directory into which the *slurmd*
   daemon's state information and batch job script information are
   written. This must be a common pathname for all nodes, but should
   represent a directory which is local to each node (reference a local
   file system). The default value is "/var/spool/slurmd". Any "%h"
   within the name is replaced with the hostname on which the *slurmd*
   is running. Any "%n" within the name is replaced with the Slurm node
   name on which the *slurmd* is running. 


"""

I hope that helps,

Doug

On 11/18/16 1:07 AM, Janne Blomqvist wrote:

On 2016-11-17 12:53, Manuel Rodríguez Pascual wrote:

Hi all,

I keep having some issues using Slurm + mvapich2. It seems that I cannot
correctly configure Slurm and mvapich2 to work together. In particular,
sbatch works correctly but srun does not.  Maybe someone here can
provide me some guidance, as I suspect that the error is an obvious one,
but I just cannot find it.

CONFIGURATION INFO:
I am employing Slurm 17.02.0-0pre2 and mvapich 2.2.
Mvapich is compiled with "--disable-mcast --with-slurm="  <---there is a note about this at the bottom of the mail
Slurm is compiled with no special options. After compilation, I executed
"make && make install" in "contribs/pmi2/" (I read it somewhere)
Slurm is configured with "MpiDefault=pmi2" in slurm.conf

TESTS:
I am executing a "helloWorldMPI" that displays a hello world message and
writes down the node name for each MPI task.

sbatch works perfectly:

$ sbatch -n 2 --tasks-per-node=2 --wrap 'mpiexec  ./helloWorldMPI'
Submitted batch job 750

$ more slurm-750.out
Process 0 of 2 is on acme12.ciemat.es 
Hello world from process 0 of 2
Process 1 of 2 is on acme12.ciemat.es 
Hello world from process 1 of 2

$sbatch -n 2 --tasks-per-node=1 -p debug --wrap 'mpiexec  ./helloWorldMPI'
Submitted batch job 748

$ more slurm-748.out
Process 0 of 2 is on acme11.ciemat.es 
Hello world from process 0 of 2
Process 1 of 2 is on acme12.ciemat.es 
Hello world from process 1 of 2


However, srun fails.
On a single node it works correctly:
$ srun -n 2 --tasks-per-node=2   ./helloWorldMPI
Process 0 of 2 is on acme11.ciemat.es 
Hello world from process 0 of 2
Process 1 of 2 is on acme11.ciemat.es 
Hello world from process 1 of 2

But when using more than one node, it fails. Below there is the
experiment with a lot of debugging info, in case it helps.

(note that the job ID will be different sometimes as this mail is the
result of multiple submissions and copy/pastes)

$ srun -n 2 --tasks-per-node=1   ./helloWorldMPI
srun: error: mpi/pmi2: failed to send temp kvs to compute nodes
slurmstepd: error: *** STEP 753.0 ON acme11 CANCELLED AT
2016-11-17T10:19:47 ***
srun: Job step aborted: Waiting up to 32 seconds for job step to finish.
srun: error: acme11: task 0: Killed
srun: error: acme12: task 1: Killed


Slurmctld output:
slurmctld: debug2: Performing purge of old job records
slurmctld: debug2: Performing full system state save
slurmctld: debug3: Writing job id 753 to header record of job_state file
slurmctld: debug2: sched: Processing RPC: REQUEST_RESOURCE_ALLOCATION
from uid=500
slurmctld: debug3: JobDesc: user_id=500 job_id=N/A partition=(null)
name=helloWorldMPI
slurmctld: debug3:cpus=2-4294967294 pn_min_cpus=-1 core_spec=-1
slurmctld: debug3:Nodes=1-[4294967294] Sock/Node=65534
Core/Sock=65534 Thread/Core=65534
slurmctld: debug3:pn_min_memory_job=18446744073709551615
pn_min_tmp_disk=-1
slurmctld: debug3:immediate=0 features=(null) reservation=(null)
slurmctld: debug3:req_nodes=(null) exc_nodes=(null) gres=(null)
slurmctld: debug3:time_limit=-1--1 priority=-1 contiguous=0 shared=-1
slurmctld: debug3:kill_on_node_fail=-1 script=(null)
slurmctld: debug3:argv="./helloWorldMPI"
slurmctld: debug3:stdin=(null) stdout=(null) stderr=(null)
slurmctld: debug3:work_dir=/home/slurm/tests alloc_node:sid=acme31:11229
slurmctld: debug3:power_flags=
slurmctld: debug3:resp_host=172.17.31.165 alloc_resp_port=56804
other_port=33290
slurmctld: debug3:dependency=(null) account=(null) qos=(null)
comment=(null)
slurmctld: debug3:mail_type=0 mail_user=(null) nice=0 num_tasks=2
open_mode=0 overcommit=-1 acctg_freq=(null)
slurmctld: debug3:network=(null) begin=Unknown cpus_per_task=-1
requeue=-1 licenses=(null)
slurmctld: debug3:end_time= signal=0@0 wait_all_nodes=-1 cpu_freq=
slurmctld: debug3:ntasks_per_node=1 ntasks_per_socket=-1

[slurm-dev] Re: poor utilization, jobs not being scheduled

2016-10-29 Thread Douglas Jacobsen

Hello,

What you describe sounds like the backfill scheduler is not getting all 
the way through the queue. A simple adjustment (with some downsides) is 
to set bf_interval in your SchedulerParameters field of slurm.conf to 
something bigger than the default of 30s (I use 120).  Another important 
option is to set bf_continue to ensure it restarts where it left off in 
the list within the same backfill run.  If that is insufficient there 
are other things that can be done (with care) setting things like 
bf_max_users, bf_min_prio_reserve and other similar flags that you 
should only enable depending on the structure of your system, job 
submission patters, and expectations of your users and management.


-Doug


On 10/29/16 11:56 AM, Vlad Firoiu wrote:

poor utilization, jobs not being scheduled
I'm trying to figure out why utilization is low on our university 
cluster. It appears that many cores are available, but a minimal 
resource 10 minute job has been waiting in queue for days. There 
happen to be some big high priority jobs at the front of the queue, 
and I've noticed that these are being constantly scheduled and 
unscheduled. Is this expected behavior? Might it be causing slurm to 
never reach lower priority jobs and consider them for scheduling/backfill?




[slurm-dev] Re: Reserved column on UserUtilizationByAccount sreports

2016-10-18 Thread Douglas Jacobsen
Reserved time in sreport is time nodes are held idle (by the backfill 
scheduler) to start the job.  If you aren't using backfill, or if all 
job submissions request about the same quantity of hardware resources 
then it may always be zero.  If there were some users submitting large 
jobs and some small, then I would expect there to be some non-zero time.


For the same time period, what is the reserved column say for "sreport 
-T CPU -t MinPer cluster utilization"?  Meaning, not by account.  If 
that Reserved is 0 for the cluster overall, then that does explain why 
it's also zero for all accounts.  If there is a discrepancy, then 
perhaps there may be something to investigate.


-Doug

On 10/18/16 2:36 AM, Albert Gil Moreno wrote:

Reserved column on UserUtilizationByAccount sreports
Hi,

It seems that right-now (or at least in version 15.08.9) the column 
Reserved in a UserUtilizationByAccount  sreport is always 0, like Idle 
and Down.


For example:

sreport -T CPU -t MinPer cluster UserUtilizationByAccount 
Format=TresName%4,Login,Used,Reserved,Idle,Down Start=`date -d "last 
month" +%D` End=now


Cluster/User/Account Utilization 2016-09-18T00:00:00 - 
2016-10-18T09:59:59 (2628000 secs)

Use reported in TRES Minutes/Percentage of Total

TRES LoginUsed   Reserved Idle   Down
 - - -- 
 --
 cpu   pbellot 266612(9.77%)   0(0.00%) 0(0.00%)   
  0(0.00%)
 cpumpomar 157124(5.76%)   0(0.00%) 0(0.00%)   
  0(0.00%)
 cpu  mbellver  61747(2.26%)   0(0.00%) 0(0.00%)   
  0(0.00%)




For me it's clear that Down and Idle are values that has none sense to 
query "ByAccount", but the Reserved could be seen as the time that an 
account has reserved the resources but still not allocated them; so, 
its in queue time?


Does it has sense to you?
Is it possible to implement?

Thanks!


Albert





[slurm-dev] Re: Slurmctld auto restart and kill running job, why ?

2016-10-11 Thread Douglas Jacobsen
Fyi, sending sighup to slurmctld is sufficient for rotating the
slurmctld.log file.  No need to actually restart it all the way.  It is
good to know the cause behind the deleted jobs.

Doug

On Oct 11, 2016 7:36 AM, "Ryan Novosielski"  wrote:

>
> Thanks for clearing that up. I was pretty sure there was no problem at all
> in using logrotate, and I know that restarting slurmctld does not
> ordinarily lose jobs.
>
> --
> 
> || \\UTGERS, |---*
> O*---
> ||_// the State  | Ryan Novosielski - novos...@rutgers.edu
> || \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS
> Campus
> ||  \\of NJ  | Office of Advanced Research Computing - MSB C630, Newark
> `'
>
> > On Oct 11, 2016, at 06:19, Philippe  wrote:
> >
> > Hello all,
> > sorry for this long delay since my first post.
> > Thanks for all the answers, it helped me to make some tests, and after
> not so long, I realize I use a personnal script to launch the daemons, and
> I was still using my "debug" start line, which contains the startclean
> argument ...
> > So it's all my fault, Slurm did his job to startclean when the logrotate
> triggered it.
> >
> > Sorry for that !
> >
> > On Thu, Sep 29, 2016 at 2:05 PM, Janne Blomqvist <
> janne.blomqv...@aalto.fi> wrote:
> > On 2016-09-27 10:39, Philippe wrote:
> > > If I can't use logrotate, what must I use ?
> >
> > You can log via syslog, and let your syslog daemon handle the rotation
> > (and rate limiting, disk full, logging to a central log host and all the
> > other nice things that syslog can do for you).
> >
> >
> > --
> > Janne Blomqvist, D.Sc. (Tech.), Scientific Computing Specialist
> > Aalto University School of Science, PHYS & NBE
> > +358503841576 || janne.blomqv...@aalto.fi
> >
> >
>


[slurm-dev] Re: QOS, Limits, CPUs and threads - something is wrong?

2016-10-03 Thread Douglas Jacobsen
Hi Lachlan,

You mentioned your slurm.conf has:
AccountingStorageEnforce=qos

The "qos" restriction only enforces that a user is authorized to use a
particular qos (in the qos string of the association in the slurm
database).  To enforce limits, you need to also use limits.  If you want to
prevent partial jobs from running and potentially being killed when a
resource runs out (only applicable for certain limits), you might also
consider setting "safe", e.g.,

AccountingStorageEnforce=limits,safe,qos

http://slurm.schedmd.com/slurm.conf.html#OPT_AccountingStorageEnforce

I hope that helps,
Doug


Doug Jacobsen, Ph.D.
NERSC Computer Systems Engineer
National Energy Research Scientific Computing Center 
dmjacob...@lbl.gov

- __o
-- _ '\<,_
--(_)/  (_)__


On Sun, Oct 2, 2016 at 9:08 PM, Lachlan Musicman  wrote:

> I started a thread on understand QOS, but quickly realised I had made a
> fundamental error in my configuration. I fixed that problem last week.
> (ref: https://groups.google.com/forum/#!msg/slurm-devel/
> dqL30WwmrmU/SoOMHmRVDAAJ )
>
> Despite these changes, the issue remains, so I would like to ask again,
> with more background information and more analysis.
>
>
> Desired scenario: That any one user can only ever have jobs adding up to
> 90 CPUs at a time. They can submit requests for more than this, but their
> running jobs will max out at 90 and the rest of their jobs will be put in
> queue. A CPU being defined as a thread in a system that has 2 sockets, each
> with 10 cores, each core with 2 threads. (ie, when I do cat /proc/cpuinfo
> on any node, it reports 40 CPUs, so we configured to utilize 40 CPUs)
>
> Current scenario: users are getting every CPU they have requested,
> blocking other users from the partitions.
>
> Our users are able to use 40 CPUs per node, so we know that every thread
> is available as a consumable resource, as we wanted.
>
> When I use sinfo -o %C, the results re per CPU utilization reflect that
> the thread is being used as the CPU measure.
>
> Yet, as noted above, when I do an squeue, I see that users have jobs
> running with more than 90 CPUs in total.
>
> squeue that shows allocated CPUs. Note that both running users have more
> than 90 CPUS each (threads):
>
> $ squeue -o"%.4C %8q %.8i %.9P %.8j %.8u %.8T %.10M %.9l"
> CPUS QOS JOBID PARTITION NAME USERSTATE   TIME
> TIME_LIMI
>8 normal 193424  prodHalo3 kamarasi  PENDING   0:00
> 1-00:00:00
>8 normal 193423  prodHalo3 kamarasi  PENDING   0:00
> 1-00:00:00
>8 normal 193422  prodHalo3 kamarasi  PENDING   0:00
> 1-00:00:00
>
>   20 normal 189360  prod MuVd_WGS lij@pete  RUNNING   23:49:15
> 6-00:00:00
>   20 normal 189353  prod MuVd_WGS lij@pete  RUNNING 4-18:43:26
> 6-00:00:00
>   20 normal 189354  prod MuVd_WGS lij@pete  RUNNING 4-18:43:26
> 6-00:00:00
>   20 normal 189356  prod MuVd_WGS lij@pete  RUNNING 4-18:43:26
> 6-00:00:00
>   20 normal 189358  prod MuVd_WGS lij@pete  RUNNING 4-18:43:26
> 6-00:00:00
>8 normal 193417  prodHalo3 kamarasi  RUNNING   0:01
> 1-00:00:00
>8 normal 193416  prodHalo3 kamarasi  RUNNING   0:18
> 1-00:00:00
>8 normal 193415  prodHalo3 kamarasi  RUNNING   0:19
> 1-00:00:00
>8 normal 193414  prodHalo3 kamarasi  RUNNING   0:47
> 1-00:00:00
>8 normal 193413  prodHalo3 kamarasi  RUNNING   2:08
> 1-00:00:00
>8 normal 193412  prodHalo3 kamarasi  RUNNING   2:09
> 1-00:00:00
>8 normal 193411  prodHalo3 kamarasi  RUNNING   3:24
> 1-00:00:00
>8 normal 193410  prodHalo3 kamarasi  RUNNING   5:04
> 1-00:00:00
>8 normal 193409  prodHalo3 kamarasi  RUNNING   5:06
> 1-00:00:00
>8 normal 193408  prodHalo3 kamarasi  RUNNING   7:40
> 1-00:00:00
>8 normal 193407  prodHalo3 kamarasi  RUNNING  10:48
> 1-00:00:00
>8 normal 193406  prodHalo3 kamarasi  RUNNING  10:50
> 1-00:00:00
>8 normal 193405  prodHalo3 kamarasi  RUNNING  11:34
> 1-00:00:00
>8 normal 193404  prodHalo3 kamarasi  RUNNING  12:00
> 1-00:00:00
>8 normal 193403  prodHalo3 kamarasi  RUNNING  12:10
> 1-00:00:00
>8 normal 193402  prodHalo3 kamarasi  RUNNING  12:21
> 1-00:00:00
>8 normal 193401  prodHalo3 kamarasi  RUNNING  12:40
> 1-00:00:00
>8 normal 193400  prodHalo3 kamarasi  RUNNING  17:02
> 1-00:00:00
>8 normal 193399  prodHalo3 kamarasi  RUNNING  21:03
> 1-00:00:00
>8 normal 193396  prodHalo3 kamarasi  RUNNING  22:01
> 1-00:00:00
>8 normal 193394  prodHalo3 kamarasi  RUNNING  23:40
> 1-00:00:00
>8 normal 

[slurm-dev] Re: Backfill scheduler should look at all jobs

2016-08-23 Thread Douglas Jacobsen
Hello,

I'd recommend taking a look at bf_min_prio_resv (16.05 feature).

The basic idea is that above the priority threshold it'll do the normal
backfill algorithm -- look at each job, in order, check to see if it can
run, if not, plan for it.  Below the threshold, it'll still go in order,
but will simply check if the job can start NOW without disrupting the
schedule for the reserving jobs.  This check is MUCH faster than performing
the planning and reserving of resources.

The basic idea here is that the high priority segment of the list
(presuming you're using aging, or other schemes that give relatively stable
priority values in terms of their behavior), really only changes
infrequently, where as the lower portions of the priority list are
constantly changing.  If the bf scheduler has seen a job many times before
and couldn't start it then, what are the odds it'll start on this
iteration?  Low, unless the start time has arrived or jobs have ended
early.  Thus the *low* priority portion of the list is the high value
search space for finding backfill opportunities.

For this feature, you'll still need bf_continue as well as a bf_interval
that is reasonable for the number of jobs you want to reserve resources for
(60s works well for about ~500 jobs depending on your job mix, scale,
processor speed, etc).  Many of the bf_user and such options can be greatly
increased or disabled with this scheme.

-Doug


Doug Jacobsen, Ph.D.
NERSC Computer Systems Engineer
National Energy Research Scientific Computing Center 
dmjacob...@lbl.gov

- __o
-- _ '\<,_
--(_)/  (_)__


On Tue, Aug 23, 2016 at 12:54 AM, Ulf Markwardt  wrote:

> Hello Christopher,
>
> > Isn't this what bf_continue is for?
> Not really: bf_continue just lets the bf scheduler continue at the same
> position where it was interrupted (e.g. for job submissions). So it does
> not consider the new jobs but crawls down first. (These interrupts are
> controlled by bf_yield_interval and bf_yield_sleep.)
>
> Once bf_interval is reached the bf scheduler starts at the top:
> "The backfill scheduler will start over after reaching this time limit
> (including time spent sleeping), even if the maximum job counts have not
> been reached."
>
> Ulf
>
> --
> ___
> Dr. Ulf Markwardt
>
> Technische Universität Dresden
> Center for Information Services and High Performance Computing (ZIH)
> 01062 Dresden, Germany
>
> Phone: (+49) 351/463-33640  WWW:  http://www.tu-dresden.de/zih
>
>


[slurm-dev] Re: SLURM job's email notification does not work

2016-08-18 Thread Douglas Jacobsen
Email is only sent by slurmctld, you'll need to change slurm.conf there and
at least do an `scontrol reconfigure`, then perhaps it'll start working.

-Doug


Doug Jacobsen, Ph.D.
NERSC Computer Systems Engineer
National Energy Research Scientific Computing Center 

- __o
-- _ '\<,_
--(_)/  (_)__


On Thu, Aug 18, 2016 at 6:23 AM, Fatih Öztürk 
wrote:

> Dear Christian,
>
> What i did now;
>
> 1) Changed computenode3 status to DRAIN
> 2) Changed slurm.conf only on computenode3. Added MailProg=/usr/bin/mailx
> at the end of the slurm.conf
> 3)Restarted munged and slurmd services
> 4) Changed computenode3 status to RESUME
>
> Both with root and my own user id, email is still not sent again from
> computenode3.
>
> Regards.
>
>
>
> Fatih Öztürk
> Engineering Information Technologies
> Information Technologies
>
> fatih.ozt...@tai.com.tr
>
>
> TAI - Turkish Aerospace Industries, Inc.
> Address: Fethiye Mh. Havacılık Blv. No:17 06980 Kazan – ANKARA / TURKEY
> Tel: +90 312 811 1800-3179 // Fax: +90 312 811 1425  www.tai.com.tr
>
> Legal Notice :
> The Terms and Conditions which this e-mail is subject to, can be accessed
> from this link:
> http://www.tai.com.tr/tr/yasal-uyari
>
>
>
>
>
>
> -Original Message-
> From: Christian Goll [mailto:christian.g...@h-its.org]
> Sent: Thursday, August 18, 2016 3:59 PM
> To: slurm-dev
> Subject: [slurm-dev] Re: SLURM job's email notification does not work
>
>
> Hello Fatih,
> did you set the variable MailProg to right value, e.g.
> MailProg=/usr/bin/mailx
> in slurm.conf?
>
> kind regards,
> Christian
> On 18.08.2016 14:44, Fatih Öztürk wrote:
> > Hello,
> >
> >
> > I have a problem about email notification with jobs. I would be
> > appreciate if you could help me.
> >
> >
> > We have a SLURM cluster: 1 Head Node and about 20 Compute Nodes.
> > User's run their jobs within only on head node with their own
> credentials.
> >
> >
> > As an example, if i run a job like below on the head node;
> >
> >
> > [t15247@headnode ] # srun -n1 -w "computenode3" --mail-type=ALL
> > --mail-user=fatih.ozt...@tai.com.tr /bin/hostname
> >
> >
> > it runs successfully, however no slurm notification email sent to user.
> >
> >
> > However, this user can send email manually with "mailx" successfully
> > from both on headnode and computenode3.
> >
> >
> > Would you please help me about this problem?
> >
> >
> > Note: Remote microsoft exchange smtp server informations are set in
> > /etc/mail.rc
> >
> >
> > Regards,
> >
> >
> > *@*
> >
> > *Fatih Öztürk*
> >
> > TAI, Turkish Aerospace Industries Inc.
> >
> > Engineering IT
> >
> > +90 312 811 18 00/ 3179
> >
> >
> >
> >
> >
> >
> > *Fatih Öztürk*
> >
> > Engineering Information Technologies
> >
> > Information Technologies
> >
> > fatih.ozt...@tai.com.tr 
> >
> > TAI - Turkish Aerospace Industries, Inc.   *www.tai.com.tr* /
> > *www.taigermany.com* 
> > 
> > 
> > 
> > Address: Fethiye Mh. Havacilik Blv. No:17 06980 Kazan – ANKARA /
> > TURKEY Tel: +90 (312) 811 1800 / 810 8000-3179 //
> > Fax: +90 312 811 1425
> >
> > 
> >
> > Legal Notice :
> > *The Terms and Conditions which this e-mail is subject to, can be
> > accessed from this link.* 
> >
> >
>
> --
> Dr. Christian Goll
> HITS gGmbH
> Schloss-Wolfsbrunnenweg 35
> 69118 Heidelberg
> Germany
> Phone: +49 6221 533 230
> Fax: +49 6221 533 230
> 
> Amtsgericht Mannheim / HRB 337446
> Managing Director: Dr. Gesa Schönberger
>


[slurm-dev] Re: SPANK prolog not run via sbatch (bug?)

2016-07-08 Thread Douglas Jacobsen
Hello,

Do you have "PrologFlags=alloc" in slurm.conf?  If not, you'll need it,
otherwise the privileged prologs won't run until the first step is executed
on a node.

-Doug


Doug Jacobsen, Ph.D.
NERSC Computer Systems Engineer
National Energy Research Scientific Computing Center 
dmjacob...@lbl.gov

- __o
-- _ '\<,_
--(_)/  (_)__


On Fri, Jul 8, 2016 at 11:20 AM, Yong Qin  wrote:

> Hi,
>
> We implemented our own private /tmp solution via a spank plugin. The
> implementation was developed and tested on 15.08.6 and it went well.
> However when we move it to the production system which we just upgraded to
> 15.08.9, it appears that the slurm_spank_job_prolog() and
> slurm_spank_task_init_privileged() functions are not executed if the job is
> submitted via sbatch but slurm_spank_job_epilog() is. This is all fine if
> the job is submitted via srun though.
>
> I tried to search it in the bugzilla but couldn't find any report on it,
> or maybe I'm not searching with the right keywords? If it is an existing
> bug can anybody provide a pointer to it? If it's not a known bug, I'm
> wondering if other sites are seeing the same behavior as we do.
>
> Thanks,
>
> Yong Qin
>


[slurm-dev] RE: An issue with Grid Engine to Slurm migration

2016-05-05 Thread Douglas Jacobsen
Hello,

As for allowing users to specify defaults for their sbatch/salloc
executions, I think the best analog in SLURM to the GridEngine .sge_request
is for users to set environment variables in their dotfiles or other means
prior to running sbatch.  e.g., SBATCH_ACCOUNT would set an implicit "-A"
on the sbatch statement.  See the sbatch man page under "INPUT ENVIRONMENT
VARIABLES" for more information.

-Doug


Doug Jacobsen, Ph.D.
NERSC Computer Systems Engineer
National Energy Research Scientific Computing Center 

- __o
-- _ '\<,_
--(_)/  (_)__


On Fri, Apr 29, 2016 at 5:45 PM, Luis Yanes (TGAC) 
wrote:

>
> Hi,
>
> Depending on what those options are, I guess the best way to go would be
> to place them on the Prolog of the job. Here is a link to the documentation
> http://slurm.schedmd.com/prolog_epilog.html
>
> Best,
> Luis.
> 
> From: CB [cbalw...@gmail.com]
> Sent: 29 April 2016 20:31
> To: slurm-dev
> Subject: [slurm-dev] An issue with Grid Engine to Slurm migration
>
> Hello,
>
> We're using Grid Engine system "sge_request" file to add some default
> "options", which will be added to users' job submission command.
> However, if users have the same option with different value in their own
> job submission command, users can override the default options.
>
> I'm looking for something similar with Slurm.
> I've read Slurm documentations but it is not obvious to me yet.
>
> Any suggestions are appreciated.
>
> Thanks,
> - Chansup=


[slurm-dev] Re: Need to restart slurmctld when adding user to accounting

2016-03-30 Thread Douglas Jacobsen
Sorry, you just said they were, somehow misread this.  Try increasing
logging level, perhaps the easiest way is running slurmctld and slurmdbd
interactively with the -Dvvv arguments.  Then add a user and see if any
errors occur, particularly on the slurmctld side after the sacctmgr update
is done.

slurmdbd will send the accounting update to slurmctld slightly after
sacctmgr returns.


Doug Jacobsen, Ph.D.
NERSC Computer Systems Engineer
National Energy Research Scientific Computing Center <http://www.nersc.gov>
dmjacob...@lbl.gov

- __o
-- _ '\<,_
--(_)/  (_)__


On Wed, Mar 30, 2016 at 5:38 PM, Douglas Jacobsen <dmjacob...@lbl.gov>
wrote:

> Are both slurmdbd and slurmctld running as the same UID?  (if not they
> need to be, I believe you can see the errors on slurmdbd debug2 or debug3)
>
>
>
> 
> Doug Jacobsen, Ph.D.
> NERSC Computer Systems Engineer
> National Energy Research Scientific Computing Center
> <http://www.nersc.gov>
> dmjacob...@lbl.gov
>
> - __o
> -- _ '\<,_
> --(_)/  (_)__
>
>
> On Wed, Mar 30, 2016 at 5:32 PM, Terri Knight <tlkni...@ucdavis.edu>
> wrote:
>
>>
>> I posted earlier (Dec 28, 2015) about this issue and was told to check
>> that the slurmdbd and slurmctl daemons were running as the same user- they
>> weren't at that time. I thought making that change would resolve the
>> problem but it did not.
>>
>> These daemons are now both running as root
>> root  6463 1  0 17:01 ?00:00:00
>> /share/apps/slurm-15.08.8/sbin/slurmdbd
>> root  6743 1  0 17:05 ?00:00:00
>> /share/apps/slurm-15.08.8//sbin/slurmctld
>>
>> on the compute node:
>> root  7874 1  0 17:03 ?00:00:00
>> /share/apps/slurm-15.08.8//sbin/slurmd
>>
>> Upon further testing, I only need restart the slurmctld daemon to get the
>> new user added such that he can run a job. So not as big a deal to me now
>> but it is different than in older versions of slurm.
>>
>> I'm adding a new user to an existing account and before I restart
>> slurmctld I see this in the slurmctld log when I try to "srun date" as that
>> user:
>>
>> [2016-03-30T17:04:50.107] error: User 9101 not found
>> [2016-03-30T17:04:50.107] _job_create: invalid account or partition for
>> user 9101, account '(null)', and partition 'debug'
>> [2016-03-30T17:04:50.142] _slurm_rpc_allocate_resources: Invalid account
>> or account/partition combination specified
>> [2016-03-30T17:05:11.381] Terminate signal (SIGINT or SIGTERM) received
>>
>> Oddly the account is "null"
>>
>> Here is the command to add the user,
>> sacctmgr add user johndoe defaultaccount=boris
>> partition=low,med,high,debug cluster=jane
>>
>> slurm-15.08.8 on Ubuntu 14.04.4
>>
>> Like I said, I can live with it since its only 1 restart.
>>
>> Thanks,
>> Terri
>>
>
>


[slurm-dev] Re: Need to restart slurmctld when adding user to accounting

2016-03-30 Thread Douglas Jacobsen
Are both slurmdbd and slurmctld running as the same UID?  (if not they need
to be, I believe you can see the errors on slurmdbd debug2 or debug3)




Doug Jacobsen, Ph.D.
NERSC Computer Systems Engineer
National Energy Research Scientific Computing Center 
dmjacob...@lbl.gov

- __o
-- _ '\<,_
--(_)/  (_)__


On Wed, Mar 30, 2016 at 5:32 PM, Terri Knight  wrote:

>
> I posted earlier (Dec 28, 2015) about this issue and was told to check
> that the slurmdbd and slurmctl daemons were running as the same user- they
> weren't at that time. I thought making that change would resolve the
> problem but it did not.
>
> These daemons are now both running as root
> root  6463 1  0 17:01 ?00:00:00
> /share/apps/slurm-15.08.8/sbin/slurmdbd
> root  6743 1  0 17:05 ?00:00:00
> /share/apps/slurm-15.08.8//sbin/slurmctld
>
> on the compute node:
> root  7874 1  0 17:03 ?00:00:00
> /share/apps/slurm-15.08.8//sbin/slurmd
>
> Upon further testing, I only need restart the slurmctld daemon to get the
> new user added such that he can run a job. So not as big a deal to me now
> but it is different than in older versions of slurm.
>
> I'm adding a new user to an existing account and before I restart
> slurmctld I see this in the slurmctld log when I try to "srun date" as that
> user:
>
> [2016-03-30T17:04:50.107] error: User 9101 not found
> [2016-03-30T17:04:50.107] _job_create: invalid account or partition for
> user 9101, account '(null)', and partition 'debug'
> [2016-03-30T17:04:50.142] _slurm_rpc_allocate_resources: Invalid account
> or account/partition combination specified
> [2016-03-30T17:05:11.381] Terminate signal (SIGINT or SIGTERM) received
>
> Oddly the account is "null"
>
> Here is the command to add the user,
> sacctmgr add user johndoe defaultaccount=boris
> partition=low,med,high,debug cluster=jane
>
> slurm-15.08.8 on Ubuntu 14.04.4
>
> Like I said, I can live with it since its only 1 restart.
>
> Thanks,
> Terri
>


[slurm-dev] Re: MaxTRESMins limit on a job kills a running job -- is it meant to?

2016-01-07 Thread Douglas Jacobsen
I think you probably want to add "safe" to AccountingStorageEnforce in
slurm.conf;  that should prevent it from starting jobs that would exceed
association limits.


Doug Jacobsen, Ph.D.
NERSC Computer Systems Engineer
National Energy Research Scientific Computing Center 
dmjacob...@lbl.gov

- __o
-- _ '\<,_
--(_)/  (_)__


On Thu, Jan 7, 2016 at 7:15 AM, Lennart Karlsson 
wrote:

>
> We have set the MaxTRESMins limit on accounts and users, to make it
> impossible to start what we think is outrageously large jobs.
>
> But we have found an unwanted side effect:
> When the user asks for a longer timelimit, we often allow that, and
> when we increase the timelimit, sometimes jobs run into the
> MaxTRESMins limit and die:
> Dec 28 17:20:18 milou-q slurmctld: [2015-12-28T17:20:09.072] Job 6574528
> timed out, the job is at or exceeds assoc 10056(b2013086/ansgar/(null)) max
> tres(cpu) minutes of 60 with 61
>
> For us, this looks like a bug.
>
> Please, we would prefer the MaxTRESMins limit not to kill already
> running jobs.
>
> Cheers,
> -- Lennart Karlsson
>UPPMAX, Uppsala University, Sweden
>http://www.uppmax.uu.se
>


[slurm-dev] Re: need to restart slurm daemons for accounting changes

2015-12-28 Thread Douglas Jacobsen
I'm betting that slurmctld is running as a different uid than slurmdbd.
Once both are running as the same uid, slurmctld will start taking updates
from slurmdbd (via sacctmgr).

-Doug


On Mon, Dec 28, 2015 at 2:51 PM, Terri Knight  wrote:

> Since upgrading to slurm 15.08.1 on Ubuntu 14.04.3  it is required to
> restart mysql, slurmdbd, and slurmctl daemons before a new user receives
> access to submit a job (accounting enabled).
>
> $ sacctmgr add user ptrimmer defaultaccount=adamgrp partition=serial
> cluster=farm
>
> $ sacctmgr dump farm
> ...
> User - 'ptrimmer':Partition='serial':DefaultAccount='adamgrp':Fairshare=1
> ...
>
> As user ptrimmer:
> $ srun -p serial date
> srun: error: Unable to allocate resources: Invalid account or
> account/partition combination specified
>
> On the slurm server as root:
> # service mysql stop
> mysql stop/waiting
> #service slurm-llnl-slurmdbd stop
>  * Stopping slurm-llnl database server interface
>[ OK ]
> # service slurm-llnl stop
>  * Stopping slurm central management daemon slurmctld
> [ OK ]
> slurmctld is stopped
> #  service mysql start
> mysql start/running, process 6270
> # service slurm-llnl-slurmdbd start
>  * Starting slurm-llnl database server interface
>[ OK ]
> #  service slurm-llnl start
>  * Starting slurm central management daemon slurmctld
>
> Back to user ptrimmer:
> ptrimmer@farm:~$  srun -p serial date
> srun: job 5898165 queued and waiting for resources
> srun: job 5898165 has been allocated resources
> Mon Dec 28 12:57:20 PST 2015
>
> I also tried running
> $ scontrol reconfig
> on the slurm server before restarting the slurm daemons but that did not
> help.
>
> Is this proper? In slurm 2.6 I did not have to do this.
>
> thanks,
> Terri Knight
>


[slurm-dev] Re: Fwd: SLURM : how to have a round-robin across nodes based on load average?

2015-11-18 Thread Douglas Jacobsen
Check out the LLN partition configuration option.  Least loaded node
On Nov 18, 2015 6:40 PM, "cips bmkg"  wrote:

> Hi,
>
> If you generate a lot of mono-core sequential tasks, the regular SLURM
> allocation would pile them up into the first node, following with second ,
> etc...
>
> The last node would (almost) never be used.
>
> Hence the idea to make it automatically distributed across nodes, one job
> at a time.
>
> Cheers,
>
> Remi
>
> On Wed, Nov 18, 2015 at 6:38 PM, Daniel Letai  wrote:
>
>> I'm curious - what would be he point of such scheduling?
>> I tried to think about a scenario in which such a setting would gain me
>> anything significant and came up with nothing. What is the advantage of
>> this distribution?
>>
>> On 11/18/2015 08:37 AM, cips bmkg wrote:
>>
>>
>> -- Forwarded message --
>> From: cips bmkg 
>> Date: Wed, Nov 18, 2015 at 12:11 PM
>> Subject: SLURM : how to have a round-robin across nodes based on load
>> average?
>> To: slurm-dev@schedmd.com
>>
>>
>> Hi,
>>
>> As a former user of SGE, I was used to SGE distributing jobs to nodes
>> that had not been used recently (based on load average).
>>
>> I can see that the round robin distribution is only done for intra
>> node... while an inter-node setting would probably have gotten me what I
>> want.
>>
>> Can anyone advice how to set it up?
>>
>> thanks
>>
>>
>>
>>
>


[slurm-dev] Re: NERSC shifter

2015-11-12 Thread Douglas Jacobsen
Hello,

An early release of the software should be available starting next week!
We're trying to get the final pieces in place (sans documentation) before
SC.  I'll send a notification to this list once it is available.

Sorry for the delays!
-Doug


Doug Jacobsen, Ph.D.
NERSC Computer Systems Engineer
National Energy Research Scientific Computing Center 

- __o
-- _ '\<,_
--(_)/  (_)__


On Thu, Nov 12, 2015 at 4:24 AM, Michael Gutteridge <
michael.gutteri...@gmail.com> wrote:

> Hi all
>
> I saw the presentation about Shifter from this year's user group meeting-
> since thin I've been stalking the NERSC site looking for updates and (hope
> of hopes) a download.  I'm guessing its still working through its early
> stages of development.
>
> Would anyone have any idea of progress or a possible release date?  Or a
> NERSC contact I could ask?  We're getting lots of interest in docker images
> on our Slurm cluster and this seems to address a lot of our concerns.
>
> Thanks
>
> Michael
>
>


[slurm-dev] Re: Partition QoS

2015-11-10 Thread Douglas Jacobsen
Hi Paul,

I did this by creating the qos, e.g. sacctmgr create qos part_whatever
Then in slurm.conf setting qos=part_whatever in the "whatever" partition
definition.

scontrol reconfigure

finally, set the limits on the qos:
sacctmgr modify qos set MaxJobsPerUser=5 where name=part_whatever
...

-Doug


Doug Jacobsen, Ph.D.
NERSC Computer Systems Engineer
National Energy Research Scientific Computing Center 

- __o
-- _ '\<,_
--(_)/  (_)__


On Tue, Nov 10, 2015 at 2:29 PM, Paul Edmon  wrote:

>
> In 15.08 you are able to set QoS limits directly on a partition.  So how
> do you actually accomplish this?  I've tried a couple of ways, but no
> luck.  I haven't seen a demo of how to do this anywhere either.  My goal is
> to set up a partition with the following QoS parameters:
>
> MaxJobsPerUser=5
> MaxSubmitJobsPerUser=5
> MaxCPUsPerUser=128
>
> Thanks for the info.
>
> -Paul Edmon-
>


[slurm-dev] Re: scripted use of sacctmgr

2015-10-14 Thread Douglas Jacobsen
I just went through this exercise to integrate the SLURM database with our
site database.  I found that preparing a file (or string) like:

add user abc account=aaa
add user def account=bbb
add user def account=aaa
...
...
exit

and then piping that to stdin of "sacctmgr -i" gave an easy way to bulk
update lots of data without.  Note the "exit" to escape the loop =)

-Doug


- __o
-- _ '\<,_
--(_)/  (_)__


On Wed, Oct 14, 2015 at 2:53 AM, Thomas Orgis 
wrote:

>
> Am Tue, 13 Oct 2015 09:59:14 -0700
> schrieb Ian Logan :
>
> > Hi Thomas,
> > We use sacctmgr -i  in our scripts to remove the prompt, works
> > great.
> > Thanks,
>
> Thank _you_! I only looked for something more generic along
> "non-interactive". The --immediate switch works nicely.
>
> Though one still would like to do something about that prompt loop when
> there is no terminal to get input from. Maybe I can prepare a patch
> when I am less busy with getting things running.
>
>
> Alrighty then,
>
> Thomas
>
> --
> Dr. Thomas Orgis
> Universität Hamburg
> RRZ / Zentrale Dienste / HPC
> Schlüterstr. 70
> 20146 Hamburg
> Tel.: 040/42838 8826
> Fax: 040/428 38 6270
>


[slurm-dev] RE: Distribute M jobs on N nodes without duplication

2015-10-02 Thread Douglas Jacobsen
Hi, I'm not sure I understand the problem but you can specify -N (--nodes)
and tasks and so on for each srun.  That way you can control how many nodes
and tasks are distributed per srun:

srun -N 1 --gres=gpu:1 ...
srun -N 1 --gres=gpu:1 ...

from your original example should work..

-Doug



On Fri, Oct 2, 2015 at 2:11 PM, DIAM code distribution DIAM/CDRH/FDA <
diamc...@gmail.com> wrote:

> Anyone please help how to achieve this very basic kind of job
> distribution?  This problem has not been solved yet.
>
> On Fri, Oct 2, 2015 at 12:49 PM, John Hearns 
> wrote:
>
>> I stand corrected.
>>
>>
>>
>> I find myself in a maze of twisty little passages, all alike
>>
>>
>>
>> All the examples for SBATCH (in the SLURM manual) uses 'SRUN' for
>> execution of runs.  There are lot of other websites which gives SBATCH
>> examples and all of them uses SRUN, unless using some version of MPI.
>>
>>
>>
>>
>> --
>>
>> Scanned by *MailMarshal* - M86 Security's comprehensive email content
>> security solution.
>>
>> --
>> Any views or opinions presented in this email are solely those of the
>> author and do not necessarily represent those of the company. Employees of
>> XMA Ltd are expressly required not to make defamatory statements and not to
>> infringe or authorise any infringement of copyright or any other legal
>> right by email communications. Any such communication is contrary to
>> company policy and outside the scope of the employment of the individual
>> concerned. The company will not accept any liability in respect of such
>> communication, and the employee responsible will be personally liable for
>> any damages or other liability arising. XMA Limited is registered in
>> England and Wales (registered no. 2051703). Registered Office: Wilford
>> Industrial Estate, Ruddington Lane, Wilford, Nottingham, NG11 7EP
>>
>
>


[slurm-dev] epilogue / health check races

2015-04-02 Thread Douglas Jacobsen
Hi all,

I saw post earlier today (or yesterday) about jobs in a dependency chain
starting while the prior job epilogue is still running.  I have a related,
but more general case of this.

I've been using a test configuration of slurm on a Cray XC30 in hybrid
mode.  I've seen that the end-of-reservation nodehealthcheck (a Cray thing)
will often run at the same time as, or before a spank plugin epilogue
runs.  This generates a race between the two - especially since I use the
nodehealthcheck to validate that the epilogue properly cleaned up the job.

Is it feasible to run the job/spank epilogues *before* releasing the
resources?
Or, is this already the behavior and I'm misdiagnosing this.

Thanks,
Doug


Doug Jacobsen, Ph.D.
NERSC Computer Systems Engineer
National Energy Research Scientific Computing Center http://www.nersc.gov
dmjacob...@lbl.gov

- __o
-- _ '\,_
--(_)/  (_)__


[slurm-dev] spank plugin development: is it possible to save derived state in a job

2015-03-29 Thread Douglas Jacobsen
Hello,

I'm working on a SPANK plugin which needs to accept input from the user
(i.e., from a command line option), perform some non-trivial work during
the allocator context that may take some time, and then upon success allow
the job to continue.  Later steps, for example in the job plugin, task
initialization routine, and job epilog, will need the results of that
non-trivial work performed within the allocation context.

As far as I can tell, each stage can access the original user inputs,
either by directly accessing them in the allocator or local contexts, or
via spank_option_getopt in the prolog/epilog contexts.  I have not found a
way to store the results of my data transformation in the job somehow,
either through the job environment (spank_setenv), or using the job_control
environment (spank_job_control_setenv).  It seems that the results of
processing at each stage are confined to that stage.

Is it possible for me to somehow perform a calculation in the allocator
context, and then access the results in other contexts without going back
through the original arguments?

Thanks so much,
Doug


Doug Jacobsen, Ph.D.
NERSC Computer Systems Engineer
National Energy Research Scientific Computing Center http://www.nersc.gov
dmjacob...@lbl.gov

- __o
-- _ '\,_
--(_)/  (_)__