[slurm-dev] Re: poor utilization, jobs not being scheduled

2016-10-30 Thread Benjamin Redling

Am 31.10.2016 um 00:47 schrieb Vlad Firoiu:
> What do you mean the ScheduleType is not explicit? I see
> `SchedulerType=sched/backfill`. (I don't know too much about slurm so I
> am probably misunderstanding something.)

Vlad, you are right: ScheduleType _is_ set. I meant SelectType.
See my other mail where I corrected the typo in the meantime and added a
few uncurated ideas.

Cheers, Benjamin
-- 
FSU Jena | JULIELab.de/Staff/Benjamin+Redling.html
vox: +49 3641 9 44323 | fax: +49 3641 9 44321


[slurm-dev] Re: poor utilization, jobs not being scheduled

2016-10-30 Thread Benjamin Redling

Am 31.10.2016 um 00:23 schrieb Benjamin Redling:
> Are you aware that as long as SchedulerType 

Sorry, typo. I meant *SelectType*

(The rest I wrote next is just unfiltered noise from my brain while
skimming the conf:)

>is not set to anything explicitly, select/linear is the default? 

> Am 30. Oktober 2016 19:00:28 MEZ, schrieb Vlad Firoiu :
[...]
> ## SelectType=select/linear

This seems bad for good utilization.

> PriorityType=priority/multifactor
> PriorityDecayHalfLife=7-0
> PriorityUsageResetPeriod=MONTHLY

I'm undecided about that...

> #
[...]
> PriorityWeightQOS   = 10
> PriorityWeightFairShare = 1000

Clearly dominant QOS/Fairshare: I hope accounting is setup right -- in
the "portion" of your slurm.conf you didn't provide?

> PriorityWeightAge   = 10

Does this even matter compared to QOS and Fairshare?

> PriorityWeightJobSize   = 1

Reasons for not using PriorityFavorSmall (with or without
SMALL_RELATIVE_TO_TIME) and considering the job size? I think that would
improve utilization in combination with a SelectType that takes CR into
account.
The 10min job you mentioned would not starve behind the big ones.


> PriorityWeightPartition = 1
> PriorityMaxAge=1-0

This way the age_factor maxes out after one day -- comparatively fast to
the default of a week. But age doesn't really matter compared to QOS and
Fairshare... what's the idea behind that?


> The particular job in question has 0 priority.

My fault: to ask for "sprio -w" with the individual weights in the first
place would have been nice.

Regards,
Benjamin
-- 
FSU Jena | JULIELab.de/Staff/Benjamin+Redling.html
vox: +49 3641 9 44323 | fax: +49 3641 9 44321


[slurm-dev] Re: Can slurm work on one node?

2016-10-30 Thread Lachlan Musicman
I think it should. Can you send through your slurm.conf?

Also, the logs usually explicitly say why slurmctld/slurmd don't start, and
the best way to judge if slurm is running is with systemd:

systemctl status slurmctl
systemctl status slurmd



cheers
L.

--
The most dangerous phrase in the language is, "We've always done it this
way."

- Grace Hopper

On 31 October 2016 at 10:34, Peixin Qiao  wrote:

> I installed slurm-16.05.6 on ubuntu 16.04 on one node.
>
> When I started slurmdctld and slurmd, it does not start.
>
> I input sinfo, the output is: slurm_load_partitions: Unable to contact
> slurm controller (connect failure)
> I input: ps -ef | grep slurm, there is no output.
>
> Best Regards,
> Peixin
>


[slurm-dev] Re: poor utilization, jobs not being scheduled

2016-10-30 Thread Vlad Firoiu
What do you mean the ScheduleType is not explicit? I see
`SchedulerType=sched/backfill`. (I don't know too much about slurm so I am
probably misunderstanding something.)

On Sun, Oct 30, 2016 at 7:25 PM, Benjamin Redling <
benjamin.ra...@uni-jena.de> wrote:

> Are you aware that as long as SchedulerType is not set to anything
> explicitly, select/linear is the default? You won't get good utilization
> with that (Tuning the rest seems secondary to me. Only worth a look if you
> confirm that you want to reserve whole nodes.) Regards, Benjamin
>
>
> Am 30. Oktober 2016 19:00:28 MEZ, schrieb Vlad Firoiu :
>>
>> Here's a portion of slurm.conf:
>>
>> # SCHEDULING
>> SchedulerType=sched/backfill
>> SchedulerParameters=bf_continue,bf_window=10080,bf_
>> resolution=600,bf_max_job_test=1,bf_interval=30,bf_max_job_user=50
>> #SchedulerAuth=
>> #SchedulerPort=
>> #SchedulerRootFilter=
>> ## SelectType=select/linear
>> FastSchedule=0
>> PriorityType=priority/multifactor
>> PriorityDecayHalfLife=7-0
>> PriorityUsageResetPeriod=MONTHLY
>> #PriorityDecayHalfLife=14-0
>> #PriorityUsageResetPeriod=14-0
>> #PriorityWeightFairshare=10
>> #PriorityWeightAge=1000
>> #PriorityWeightPartition=1
>> #PriorityWeightJobSize=1000
>> #PriorityMaxAge=1-0
>> #
>> PriorityWeightQOS   = 10
>> PriorityWeightFairShare = 1000
>> PriorityWeightAge   = 10
>> PriorityWeightJobSize   = 1
>> PriorityWeightPartition = 1
>> PriorityMaxAge=1-0
>>
>> The particular job in question has 0 priority.
>>
>> On Sat, Oct 29, 2016 at 7:50 PM, Benjamin Redling <
>> benjamin.ra...@uni-jena.de> wrote:
>>
>>> Are you allowed and able to post the slurm.conf? What does sprio -o %Q
>>> -j  say about that job? BR, Benjamin
>>>
>>> Am 29. Oktober 2016 20:56:18 MESZ, schrieb Vlad Firoiu <
>>> vlad...@gmail.com>:

 I'm trying to figure out why utilization is low on our university
 cluster. It appears that many cores are available, but a minimal resource
 10 minute job has been waiting in queue for days. There happen to be some
 big high priority jobs at the front of the queue, and I've noticed that
 these are being constantly scheduled and unscheduled. Is this expected
 behavior? Might it be causing slurm to never reach lower priority jobs and
 consider them for scheduling/backfill?

>>>
>>> --
>>> Diese Nachricht wurde von meinem Android-Mobiltelefon mit K-9 Mail
>>> gesendet.
>>>
>>
>>
> --
> Diese Nachricht wurde von meinem Android-Mobiltelefon mit K-9 Mail
> gesendet.
>


[slurm-dev] Can slurm work on one node?

2016-10-30 Thread Peixin Qiao
I installed slurm-16.05.6 on ubuntu 16.04 on one node.

When I started slurmdctld and slurmd, it does not start.

I input sinfo, the output is: slurm_load_partitions: Unable to contact
slurm controller (connect failure)
I input: ps -ef | grep slurm, there is no output.

Best Regards,
Peixin


[slurm-dev] Re: Slurm versions 16.05.6 and 17.02.0-pre3 are now available

2016-10-30 Thread Christopher Samuel

On 29/10/16 00:58, Peixin Qiao wrote:

> Will the slurm version 16.05.6 support ubuntu 16.04?

If you build it from source I suspect any moderately recent version will
work there.

If you are asking about the Ubuntu packaged version, then that's a
question for Canonical, not SchedMD. :-)

All the best,
Chris
-- 
 Christopher SamuelSenior Systems Administrator
 VLSCI - Victorian Life Sciences Computation Initiative
 Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
 http://www.vlsci.org.au/  http://twitter.com/vlsci


[slurm-dev] Re: poor utilization, jobs not being scheduled

2016-10-30 Thread Vlad Firoiu
Here's a portion of slurm.conf:

# SCHEDULING
SchedulerType=sched/backfill
SchedulerParameters=bf_continue,bf_window=10080,bf_resolution=600,bf_max_job_test=1,bf_interval=30,bf_max_job_user=50
#SchedulerAuth=
#SchedulerPort=
#SchedulerRootFilter=
## SelectType=select/linear
FastSchedule=0
PriorityType=priority/multifactor
PriorityDecayHalfLife=7-0
PriorityUsageResetPeriod=MONTHLY
#PriorityDecayHalfLife=14-0
#PriorityUsageResetPeriod=14-0
#PriorityWeightFairshare=10
#PriorityWeightAge=1000
#PriorityWeightPartition=1
#PriorityWeightJobSize=1000
#PriorityMaxAge=1-0
#
PriorityWeightQOS   = 10
PriorityWeightFairShare = 1000
PriorityWeightAge   = 10
PriorityWeightJobSize   = 1
PriorityWeightPartition = 1
PriorityMaxAge=1-0

The particular job in question has 0 priority.

On Sat, Oct 29, 2016 at 7:50 PM, Benjamin Redling <
benjamin.ra...@uni-jena.de> wrote:

> Are you allowed and able to post the slurm.conf? What does sprio -o %Q -j
>  say about that job? BR, Benjamin
>
> Am 29. Oktober 2016 20:56:18 MESZ, schrieb Vlad Firoiu  >:
>>
>> I'm trying to figure out why utilization is low on our university
>> cluster. It appears that many cores are available, but a minimal resource
>> 10 minute job has been waiting in queue for days. There happen to be some
>> big high priority jobs at the front of the queue, and I've noticed that
>> these are being constantly scheduled and unscheduled. Is this expected
>> behavior? Might it be causing slurm to never reach lower priority jobs and
>> consider them for scheduling/backfill?
>>
>
> --
> Diese Nachricht wurde von meinem Android-Mobiltelefon mit K-9 Mail
> gesendet.
>


[slurm-dev] Re: slurm network address problem ?

2016-10-30 Thread Mikhail Kuzminsky



>  from Ole Holm Nielsen :
   | You might want to check out my Wiki-page for setting up Slurm on 
CentOS 
>7.2:  https://wiki.fysik.dtu.dk/niflheim/SLURM .
>Perhaps you'll solve the problem using this information? Thank you very much 
>for your reply !
I have  read your good page  again. 
It looks by my opinion that you should add to it some your own strings about 
adding (in your example)
corresponding SlurmctldLogFile and SlurmdLogFile statements to slurm.conf  .

I may now write again that (in my previous message) I wrote about 2 slurm 
ERRORS.
Now slurm works on my PC, but in patricular there was no any NETWORK ADDRESS 
errors  presented by slurm.
I didn't any changes having relations to NETWORK ADDRESSes but slurm now works 
OK.

But reading of your site gave me new ideas what may be interesting to do.

Yours
Mikhail

 
>
>
>On 10/27/2016 04:14 PM, Mikhail Kuzminsky wrote:
>> I worked w/PBS and SGE; now I'm beginner w/slurm, and installed
>> slurm-16.05.5 on my home PC/x86-64 under CentOS 7.2 1511 (desktop
>> installation).
>>
>> 1) Munge isn't necessary for my one PC w/slurm. But
>>  (after standard rpmbuild)  directory /usr/lib64/slurm  don't have
>> auth_none.so plugin,
>> and AuthType=auth/none in slurm.conf can't work, I use AuthType=auth/munge.
>>
>> 2) Both slurmd and slurmctld work on "myhome1" node.
>> But "scontrol show nodes" informs that:
>> ...
>>  Reason=NO NETWORK ADDRESS FOUND [slurm@2016-10-16T10:49:05]
>>
>> Therefore any my srun's say
>> srun: Required node not available (down, drained or reserved)
>> srun: job NN queued and waiting for resources
>>
>>
>> I tried 3 variants of NodeName in slurm.conf : w/o NodeAddr and w/use
>> 192.168.1.10 or even 127.0.0.1
>> My current string in slurm.conf is:
>> NodeName=myhome1 Sockets=1 CoresPerSocket=2 ThreadsPerCore=1
>> NodeAddr=192.168.1.10 State=UNKNOWN
>>
>>
>> Statical IP w/192.168.0.10 local address is used on my PC.
>> /etc/hosts is:
>> 127.0.0.1   localhost localhost.localdomain localhost4
>> localhost4.localdomain4
>> ::1 localhost localhost.localdomain localhost6
>> localhost6.localdomain6
>> 192.168.1.10 myhome1.ru myhome1
>>
>> /etc/sysconfig/network-scripts/ifcfg-enp8s0   contains:
>> TYPE="Ethernet"
>> BOOTPROTO="static"
>> NAME="enp8s0"
>> DEVICE="enp8s0"
>> IPADDR="192.168.1.10"
>> GATEWAY="192.168.1.1"
>> DEFROUTE="yes"
>> ONBOOT="yes"
>> PREFIX="24"
>> etc
>>
>> External connection w/global Internet is realized via the same "enp8s0"
>> and home router at 192.168.1.1
>>
>> What should I do to have normal address for myhome1 for work w/slurm ?


Mikhail Kuzminsky