[slurm-dev] Re: Send notification email

2016-09-28 Thread Eckert, Phil
If I understand your question, you can set it in the in slurm.conf file, the default is: MailProg = /usr/bin/mail From: Fanny Pagés Díaz > Reply-To: slurm-dev > Date: Wednesday, September 28, 2016 at 11:45

[slurm-dev] Re: slum in the nodes not working

2015-12-21 Thread Eckert, Phil
Make sure the slurm.conf file is identical on all nodes. If the slurmctld is running , and all the slurmd’s are running take a look at the slurmctld.log, it should provide some clues, if not you might want to post the content of your slurm.conf file. Phil Eckert LLNL From: Fany Pagés Díaz

[slurm-dev] Re: How can I send a mail when I finished a job?

2015-12-18 Thread Eckert, Phil
I believe that all that is happening in regard to mail, is that the slurmctld is executing the mail utility, with the standard arguments. Is mail set up on the node the slurmctld is running on? A quick test would be to login there and manually send yourself email. Phil Eckert LLNL On 12/18/15,

[slurm-dev] Re: A floating exclusive partition

2015-11-19 Thread Eckert, Phil
A possibility might be to do this using reservations. You could create a 5 node reservation with all concerned users having access, then have a script run by cron that periodically checks the state of the node in the reservation, if any go down update the reservation replacing the down nodes

[slurm-dev] Re: User Control of WallTime for running job

2015-11-17 Thread Eckert, Phil
The reason this hss a higher permission level is that a user could game the system by submitting a job with a 1 minute time limit, which will generally get it started very quickly because of backfill, then they could increase it to whatever they wanted. I believe almost all batch system

[slurm-dev] Re: Requested node configuration is not available when using -c

2014-09-09 Thread Eckert, Phil
Mike, In your slurm.conf you have Procs=1, (which is the same as CPUS=1) and Sockets (if ommited will be inferred from CPUS, default is 1) and CoresPerSocket (default is 1) So at this point the slurm.conf has a default configuration of 1 core per node. Phil Eckert LLNL From: Michal Zielinski

[slurm-dev] Re: Fwd: Can I stop slurm from copying a script to execution node

2014-07-10 Thread Eckert, Phil
If you don’t wish to do the submission from the “somepath” directory you can use the following sbatch option to achieve what you are looking for. -D, --workdir=directory Set the working directory of the batch script to directory before it is executed. Phil Eckert LLNL

[slurm-dev] Re: pbsdsh -u equivalent

2014-06-30 Thread Eckert, Phil
Hartley, Sounds like you might be wanting srun. If I ask for 5 nodes on our rzmerl system: salloc -p pdebug -N 5 salloc: Granted job allocation 1966117 srun hostname rzmerl1 rzmerl2 rzmerl4 rzmerl3 rzmerl5 Phil Eckert LLNL From: Hartley Greenwald

[slurm-dev] Re: moab/slurm question

2014-04-23 Thread Eckert, Phil
Marti, If the job is submitted using msub, the release of the dependency would be need to be: mjobctl -m depend=none jobid If you use: mjobctl -m depend= jobid it only removes the dependency in Moab, not Slurm. This works fine if you are using just-ini-time scheduling, since the jobs only

[slurm-dev] Re: backfill scheduler look ahead?

2014-02-21 Thread Eckert, Phil
Bill, In addition to what Alejandro said, there is another consideration. You indicated the top two high priority jobs and the 30 core job, I'm assuming that the ... indicated a number of other queued jobs ahead of the 30 core job. Also, you didn't state it, but I'm also assuming there were

[slurm-dev] Re: Can't use sbatch with cron

2013-11-22 Thread Eckert, Phil
A lot of suggestions of what to check for here: https://groups.google.com/forum/#!topic/slurm-devel/qduhQ5EbjaQ Phil Eckert LLNL On 11/21/13 5:00 PM, Arun Durvasula arun.durvas...@gmail.com wrote: Zero Bytes were transmitted or received

[slurm-dev] Re: Admin reservation on busy nodes

2013-11-12 Thread Eckert, Phil
I see the nodes busy message only if I am trying to create a reservation on top of another reservation that includes the same nodes. You might try adding the overlap flag if this is the case. Phil Eckert LLNL From: Jacqueline Scoggins jscogg...@lbl.govmailto:jscogg...@lbl.gov Reply-To:

[slurm-dev] Re: Admin reservation on busy nodes

2013-11-12 Thread Eckert, Phil
=ignore_jobs nodes=tnodes[32-591] starttime=now endtime=tomorrow partition=pbatch user=eckert Phil Eckert LLNL From: Jacqueline Scoggins jscogg...@lbl.govmailto:jscogg...@lbl.gov Reply-To: slurm-dev slurm-dev@schedmd.commailto:slurm-dev@schedmd.com Date: Tuesday, November 12, 2013 10:58 AM

[slurm-dev] Re: Admin reservation on busy nodes

2013-11-12 Thread Eckert, Phil
of the names were not valid. So I tried only the partition and it still did not wok. Thanks Jackie On Tue, Nov 12, 2013 at 11:49 AM, Eckert, Phil ecke...@llnl.govmailto:ecke...@llnl.gov wrote: Jackie, I was trying this with an earlier version of SLURM, I just build a 2.5.7 test system

[slurm-dev] Re: Job count exceeds limit

2013-08-09 Thread Eckert, Phil
I believe you have exceeded the MaxJobCount specified in your slurm.conf, or have reached the default of 1 jobs. MaxJobCount The maximum number of jobs SLURM can have in its active database at one time. Set the values of MaxJobCount and MinJobAge to insure

[slurm-dev] Re: Job submit plugin to improve backfill

2013-06-28 Thread Eckert, Phil
Another route that could be taken is to set the DefaultTime for a partition to 0, and the small patch attached to this email will reject a job when is has no time limit specified and the default_time limit is 0. I also modified the ESLURM_INVALID_TIME_LIMIT to include information that the error

[slurm-dev] Re: fairshare usage

2013-01-22 Thread Eckert, Phil
Have you looked at sshare? Phil Eckert LLNL From: Mario Kadastik mario.kadas...@cern.chmailto:mario.kadas...@cern.ch Reply-To: slurm-dev slurm-dev@schedmd.commailto:slurm-dev@schedmd.com Date: Tuesday, January 22, 2013 11:17 AM To: slurm-dev slurm-dev@schedmd.commailto:slurm-dev@schedmd.com

[slurm-dev] Re: Problem submitting jobs from a non-compute node

2012-12-11 Thread Eckert, Phil
I have scp'd it as moab.log.invalid.gz On 12/11/12 1:00 PM, Moe Jette je...@schedmd.com wrote: I would guess that your machine can communicate with the cluster's head node (where the slurmctld daemon executes and creates the job allocation), but not the compute nodes (where the slurmd daemons

[slurm-dev] Re: Job name env var not set correctly

2012-10-09 Thread Eckert, Phil
In the sbatch code, it checks to see if a job name is provided, if so it will set the SLURM_JOB_NAME environment variable, but since the overwrite argument of the call is 0, it doesn't do so if the variable is already set, which is the case you are running into once the first job is submitted.

[slurm-dev] Re: Problem with quotes in sched/wiki2 plugin

2012-06-06 Thread Eckert, Phil
According to adaptive this change was introduced in: 5_4 branch as of the .0 version changeset 7922ced7105a79a3 Phil Eckert LLNL On 6/6/12 1:29 PM, Eckert, Phil ecke...@llnl.gov wrote: In Moab 6.1 and later the Moab wiki does filter out the quotes in the data it gets from SLURM. We

[slurm-dev] Re: Implementing soft limits and notifications with Slurm/Moab

2012-06-05 Thread Eckert, Phil
Michael, I was curious, so I tried the: RESOURCELIMITPOLICY:ALWAYS,EXTENDEDVIOLATION:NOTIFY,CANCEL:12:00:00 parameter on my test cluster so that I could observe the behavior, and I also used the OverTimeLimit parameter in my SLURM test system. When the initial time limit is reached, I see that