[slurm-dev] How much realMemory to define?

2017-07-25 Thread Roe Zohar
Hello all,
I changed the fastScheduale to 1 and I want to change the realMemory from 1
to a real value.

How much memory should I define for a node? I understand that this value is
how much memory is available for jobs and not the real total memory of the
server. Is this correct?

If so, if I have 128 Mega, How much should I put in the slurm.conf for the
job (since the os also use part of the memory for example).

Thanks in advance,
Roy


[slurm-dev] Re: slurm 17.2.06 min memory problem

2017-07-10 Thread Roe Zohar
Hi Loris,

I am using SelectType=select/cons_res.
Still, when sending jobs, every server getting only one job with
memory=*MaxServerMemory* inside the job.

Adding DefMemPerCpu was the only solution but I don't understand this
behavior.

Thank,
Roy

On Jul 10, 2017 8:47 AM, "Loris Bennett" <loris.benn...@fu-berlin.de> wrote:


Hi Roy,

Roe Zohar <roezo...@gmail.com> writes:

> slurm 17.2.06 min memory problem
>
> Hi all,
> I have installed the last Slurm version and I have noticed a strange
behavior with the memory allocated for jobs.
> In my slurm conf I am having:
> SelectTypeParameters=CR_LLN,CR_CPU_Memory
>
> Now, when I am sending a new job with out giving it a --mem amount, it
automatically assign it all the server memory, which mean I am getting only
one job per server.
>
> I had to add DefMemPerCPU in order to get around that.
>
> Any body know why is that?
>
> Thanks,
> Roy

What value of SelectType are you using?  Note also that CR_LLN schedules
jobs to the least loaded nodes and so until all nodes have one job, you
will not more than one job per node.  See 'man slurm.conf'.

Regards

Loris

--
Dr. Loris Bennett (Mr.)
ZEDAT, Freie Universität Berlin Email loris.benn...@fu-berlin.de


[slurm-dev] slurm 17.2.06 min memory problem

2017-07-08 Thread Roe Zohar
Hi all,
I have installed the last Slurm version and I have noticed a strange
behavior with the memory allocated for jobs.
In my slurm conf I am having:
SelectTypeParameters=CR_LLN,CR_CPU_Memory

Now, when I am sending a new job with out giving it a --mem amount, it
automatically assign it all the server memory, which mean I am getting only
one job per server.

I had to add DefMemPerCPU in order to get around that.

Any body know why is that?

Thanks,
Roy


[slurm-dev] Slurm world wide statistics

2017-05-04 Thread Roe Zohar
Hello.
A little of topic - do you have some info or links about slurm usage in
 the world?
Which big companies use it and the usage percent of slurm vs other hpc
cluster in the world?

Thanks in advance,
Roy


[slurm-dev] Re: Priority blocking jobs despite idle machines

2017-03-24 Thread Roe Zohar
I had the same problem. Didn't find a solution.

On Mar 24, 2017 12:59 PM, "Stefan Doerr"  wrote:

> OK so after some investigation it seems that the problem is when there are
> miha jobs pending in the queue (I truncated the squeue before for brevity).
> These miha jobs require ace1, ace2 machines so they are pending since
> these machines are full right now.
> For some reason SLURM thinks that because the miha jobs are pending it
> cannot work on my jobs (which don't require specific machines) and puts
> mine pending as well.
>
> Once we cancelled the pending miha jobs (leaving the running ones
> running), I cancelled also my pending jobs, resent them and it worked.
>
> This seems to me like a quite problematic limitation in SLURM.
> Any opinions on this?
>
> On Fri, Mar 24, 2017 at 10:43 AM, Stefan Doerr 
> wrote:
>
>> Hi I was wondering about the following. I have this situation
>>
>>  84974 multiscal testpock   sdoerr PD   0:00  1
>> (Priority)
>>  84973 multiscal testpock   sdoerr PD   0:00  1
>> (Priority)
>>  81538 multiscalRC_f7 miha  R   17:41:56  1 ace2
>>  81537 multiscalRC_f6 miha  R   17:42:00  1 ace2
>>  81536 multiscalRC_f5 miha  R   17:42:04  1 ace2
>>  81535 multiscalRC_f4 miha  R   17:42:08  1 ace2
>>  81534 multiscalRC_f3 miha  R   17:42:12  1 ace1
>>  81533 multiscalRC_f2 miha  R   17:42:16  1 ace1
>>  81532 multiscalRC_f1 miha  R   17:42:20  1 ace1
>>  81531 multiscalRC_f0 miha  R   17:42:24  1 ace1
>>
>>
>> [sdoerr@xxx Fri10:35 slurmtest]  sinfo
>> PARTITION AVAIL  TIMELIMIT  NODES  STATE NODELIST
>> multiscale*  up   infinite  1 drain* ace3
>> multiscale*  up   infinite  1  down* giallo
>> multiscale*  up   infinite  2mix ace[1-2]
>> multiscale*  up   infinite  5   idle arancio,loro,oliva,rosa,suo
>>
>>
>> The miha jobs use exclude nodes to run only on machines with good GPUs
>> (ace1, ace2)
>> As you can see I have 5 machines idle which could serve my jobs but my
>> jobs are for some reason stuck in pending due to "priority". I am indeed
>> very sure that these 5 nodes satisfy the hardware requirements for my jobs
>> (also ran them yesterday).
>>
>> It's just that for some reason, which we have had before, these
>> node-excluding miha jobs seem to get the rest stuck in priority. If we
>> cancel them, then mine will go through to the idle machines. However we
>> cannot figure out what is the cause for that. I paste below the scontrol
>> show job for one miha and one of my jobs.
>>
>> Many thanks!
>>
>>
>> JobId=81534 JobName=RC_f3
>>UserId=miha(3056) GroupId=lab(3000)
>>Priority=33 Nice=0 Account=lab QOS=normal
>>JobState=RUNNING Reason=None Dependency=(null)
>>Requeue=1 Restarts=0 BatchFlag=1 Reboot=0 ExitCode=0:0
>>RunTime=17:44:07 TimeLimit=UNLIMITED TimeMin=N/A
>>SubmitTime=2017-03-23T16:53:14 EligibleTime=2017-03-23T16:53:14
>>StartTime=2017-03-23T16:53:15 EndTime=Unknown
>>PreemptTime=None SuspendTime=None SecsPreSuspend=0
>>Partition=multiscale AllocNode:Sid=blu:26225
>>ReqNodeList=(null) ExcNodeList=arancio,giallo,loro,oliva,pink,rosa,suo
>>NodeList=ace1
>>BatchHost=ace1
>>NumNodes=1 NumCPUs=1 CPUs/Task=1 ReqB:S:C:T=0:0:*:*
>>TRES=cpu=1,mem=11500,node=1,gres/gpu=1
>>Socks/Node=* NtasksPerN:B:S:C=0:0:*:* CoreSpec=*
>>MinCPUsNode=1 MinMemoryNode=11500M MinTmpDiskNode=0
>>Features=(null) Gres=gpu:1 Reservation=(null)
>>Shared=OK Contiguous=0 Licenses=(null) Network=(null)
>>Power= SICP=0
>>
>>
>> JobId=84973 JobName=testpock2
>>UserId=sdoerr(3041) GroupId=lab(3000)
>>Priority=33 Nice=0 Account=lab QOS=normal
>>JobState=PENDING Reason=Priority Dependency=(null)
>>Requeue=1 Restarts=0 BatchFlag=1 Reboot=0 ExitCode=0:0
>>RunTime=00:00:00 TimeLimit=UNLIMITED TimeMin=N/A
>>SubmitTime=2017-03-24T10:31:20 EligibleTime=2017-03-24T10:31:20
>>StartTime=2019-03-24T10:38:10 EndTime=Unknown
>>PreemptTime=None SuspendTime=None SecsPreSuspend=0
>>Partition=multiscale AllocNode:Sid=loro:15424
>>ReqNodeList=(null) ExcNodeList=(null)
>>NodeList=(null)
>>NumNodes=1 NumCPUs=1 CPUs/Task=1 ReqB:S:C:T=0:0:*:*
>>TRES=cpu=1,mem=4000,node=1,gres/gpu=1
>>Socks/Node=* NtasksPerN:B:S:C=0:0:*:* CoreSpec=*
>>MinCPUsNode=1 MinMemoryNode=4000M MinTmpDiskNode=0
>>Features=(null) Gres=gpu:1 Reservation=(null)
>>Shared=OK Contiguous=0 Licenses=(null) Network=(null)
>>Power= SICP=0
>>
>>
>>
>


[slurm-dev] Priority issue

2017-01-24 Thread Roe Zohar
Hello,
I have 1000 jobs in queue. The queue have 4 servers yet only two servers
are with running jobs and all other jobs are pending with reason
"priority". I think that only when current running jobs will finish, the
other jobs will start running on all other server.

Any idea why are they waiting with priority now instead of start running on
idle servers?

thanks


[slurm-dev] job wait for priority when server is empty

2017-01-10 Thread Roe Zohar
Hello,
Really hoping for insight here.

I have a cluster, lets say T-CLUSTER.
with servers and features like this:
srv1 features=fit1
srv2 features=fit1
srv3 features=fit1
mok1 features=fit2
mok2 features=fit2
mok3 features=fit2

I sent 3000 jobs to this cluster with --constraint=fit1 and then sent 3000
jobs with constraint=fit2.

the jobs with fit1 started running on their servers but jobs with fit2 are
pending with reason priority even that there servers are free.

When trying to realse a job, nothing happen, when I am trying to resume
jobs from fit2 I am getting "job is pending".

Any idea why there is a priority issues when the jobs have a feature which
all his servers are idle?

Thanks in advance,
Roy


[slurm-dev] Re: Max priority

2016-12-20 Thread Roe Zohar
Well, as much as I can see, It looks like 4294967293 is the max prioriry I
can give without getting any error.

On Dec 20, 2016 1:07 PM, "Roe Zohar" <roezo...@gmail.com> wrote:

> Hi Sean,
> Thanks for your reply.
>
> Sometimes you need another eye to look at things after a while. I looked
> every where but didn't see the option to add priority in the batch file
> when sending jobs. Only with "scontrol update'. This will help alot.
>
> About setting it above the greatest currently pending, thats right, its
> just that I wanted to avoid looping on all pending jobs. Maybe I'll do some
> test to find out.
>
> Thanks again,
> Roe
>
> On Dec 20, 2016 12:23 PM, "Sean McGrath" <smcg...@tchpc.tcd.ie> wrote:
>
>
> Hi,
>
> On Tue, Dec 20, 2016 at 02:09:12AM -0800, Roe Zohar wrote:
>
> > Hello, I have slurm configure with basic builtin priority (I cant change
> > that right now). I have a need to let some one sending jobs with higher
> > priority then the jobs already in queue.
> >
> > I don't see a way to set the priority when sending jobs, only after the
> job
> > is on queue. Is that currect?
> >
>
> sbatch, salloc & srun man pages have the following:
>
>--priority=
>   Request a specific job priority.  May be subject to
>   configuration specific constraints.  Only Slurm operators and
>   administrators can set the priority of a job.
>
> > Second, I thought of checking the priority of current running jobs and
> then
> > loop through the new jobs I sent and update thier priority to a higher
> one.
> > Is this sound good? If so, what is the max priority slurm job can get.
> >
>
> Increasing the priority of pending jobs to bump them above the other
> pending
> jobs has been our approach to this type of thing before.
>
> sprio -l
> scontrol update jobid=12345 priority=1
>
> I don't know what the max priority can be set to, sorry. You should only
> need to
> set it above the greatest priority score for the pending job with highest
> priority for it to take precedence over the current running jobs.
>
> Hope that helps.
>
> Sean
>
> > (I know its not best practice, but thats what I need right now)
> >
> > Thanks in advance.
> > Roy
>
> --
> Sean McGrath M.Sc
>
> Systems Administrator
> Trinity Centre for High Performance and Research Computing
> Trinity College Dublin
>
> sean.mcgr...@tchpc.tcd.ie
>
> https://www.tcd.ie/
> https://www.tchpc.tcd.ie/
>
> +353 (0) 1 896 3725
>
>
>


[slurm-dev] Re: Max priority

2016-12-20 Thread Roe Zohar
Hi Sean,
Thanks for your reply.

Sometimes you need another eye to look at things after a while. I looked
every where but didn't see the option to add priority in the batch file
when sending jobs. Only with "scontrol update'. This will help alot.

About setting it above the greatest currently pending, thats right, its
just that I wanted to avoid looping on all pending jobs. Maybe I'll do some
test to find out.

Thanks again,
Roe

On Dec 20, 2016 12:23 PM, "Sean McGrath" <smcg...@tchpc.tcd.ie> wrote:


Hi,

On Tue, Dec 20, 2016 at 02:09:12AM -0800, Roe Zohar wrote:

> Hello, I have slurm configure with basic builtin priority (I cant change
> that right now). I have a need to let some one sending jobs with higher
> priority then the jobs already in queue.
>
> I don't see a way to set the priority when sending jobs, only after the
job
> is on queue. Is that currect?
>

sbatch, salloc & srun man pages have the following:

   --priority=
  Request a specific job priority.  May be subject to
  configuration specific constraints.  Only Slurm operators and
  administrators can set the priority of a job.

> Second, I thought of checking the priority of current running jobs and
then
> loop through the new jobs I sent and update thier priority to a higher
one.
> Is this sound good? If so, what is the max priority slurm job can get.
>

Increasing the priority of pending jobs to bump them above the other pending
jobs has been our approach to this type of thing before.

sprio -l
scontrol update jobid=12345 priority=1

I don't know what the max priority can be set to, sorry. You should only
need to
set it above the greatest priority score for the pending job with highest
priority for it to take precedence over the current running jobs.

Hope that helps.

Sean

> (I know its not best practice, but thats what I need right now)
>
> Thanks in advance.
> Roy

--
Sean McGrath M.Sc

Systems Administrator
Trinity Centre for High Performance and Research Computing
Trinity College Dublin

sean.mcgr...@tchpc.tcd.ie

https://www.tcd.ie/
https://www.tchpc.tcd.ie/

+353 (0) 1 896 3725


[slurm-dev] Give user permission to change other jobs

2016-12-19 Thread Roe Zohar
Hello,
Is there a way to let a user to change features for jobs submitted by other
users?
Basiclly, I want in every department to have a super user that can handle
all jobs in a specific partition.
Thanks,
Roy