Re: [slurm-users] sacct -c not honor -M clusrername

2020-04-26 Thread Fred Liu
This way is an alternative for “-c”.
Is it possible make “-c“ work with “-M”?

Thanks.

Fred


发件人: Sudeep Narayan Banerjee 
发送时间: 星期一, 四月 27, 2020 12:33 上午
收件人: Slurm User Community List; Fred Liu
主题: Re: [slurm-users] sacct -c not honor -M clusrername


Dear Fred: should be possible

sacct --format=user,state --starttime=04/01/19 --endtime=03/31/20 | grep 
COMPLETED

Please let us know if this helps.

Thanks & Regards,
Sudeep Narayan Banerjee
System Analyst | Scientist B
Information System Technology Facility
Academic Block 5 | Room 110
Indian Institute of Technology Gandhinagar
Palaj, Gujarat 382355 INDIA

On 26/04/20 9:27 pm, Fred Liu wrote:

Hi,

Is it possible to get job completion stats per cluster?

Thanks.

Fred


Re: [slurm-users] sacct -c not honor -M clusrername

2020-04-26 Thread Sudeep Narayan Banerjee

Dear Fred: should be possible

sacct --format=user,state --starttime=04/01/19 --endtime=03/31/20 | grep 
COMPLETED


Please let us know if this helps.

Thanks & Regards,
Sudeep Narayan Banerjee
System Analyst | Scientist B
Information System Technology Facility
Academic Block 5 | Room 110
Indian Institute of Technology Gandhinagar
Palaj, Gujarat 382355 INDIA

On 26/04/20 9:27 pm, Fred Liu wrote:


Hi,

Is it possible to get job completion stats per cluster?

Thanks.

Fred


[slurm-users] sacct -c not honor -M clusrername

2020-04-26 Thread Fred Liu

Hi,

Is it possible to get job completion stats per cluster?

Thanks.

Fred


Re: [slurm-users] not allocating jobs even resources are free

2020-04-26 Thread navin srivastava
Thanks Brian,

As suggested i gone through document and what i understood  that the fair
tree leads to the Fairshare mechanism and based on that the job should be
scheduling.

so it mean job scheduling will be based on FIFO but priority will be
decided on the Fairshare. i am not sure if both conflicts here.if i see the
normal jobs priority is lower than the GPUsmall priority. so resources are
available with gpusmall partition then it should go. there is no job pend
due to gpu resources. the gpu resources itself not asked with the job.

is there any article where i can see how the fairshare works and which are
setting should not be conflict with this.
According to document it never says that if fair-share is applied then FIFO
should be disabled.

Regards
Navin.





On Sat, Apr 25, 2020 at 12:47 AM Brian W. Johanson  wrote:

>
> If you haven't looked at the man page for slurm.conf, it will answer most
> if not all your questions.
> https://slurm.schedmd.com/slurm.conf.html but I would depend on the the
> manual version that was distributed with the version you have installed as
> options do change.
>
> There is a ton of information that is tedious to get through but reading
> through it multiple times opens many doors.
>
> DefaultTime is listed in there as a Partition option.
> If you are scheduling gres/gpu resources, it's quite possible there are
> cores available with no corresponding gpus avail.
>
> -b
>
> On 4/24/20 2:49 PM, navin srivastava wrote:
>
> Thanks Brian.
>
> I need  to check the jobs order.
>
> Is there  any way to define the default timeline of the job if user  not
> specifying time limit.
>
> Also what does the meaning of fairtree  in priorities in slurm.Conf file.
>
> The set of nodes are different in partitions.FIFO  does  not care for any
> partitiong.
> Is it like strict odering means the job came 1st will go and until  it
> runs it will  not allow others.
>
> Also priorities is high for gpusmall partition and low for normal jobs and
> the nodes of the normal partition is full but gpusmall cores are available.
>
> Regards
> Navin
>
> On Fri, Apr 24, 2020, 23:49 Brian W. Johanson  wrote:
>
>> Without seeing the jobs in your queue, I would expect the next job in
>> FIFO order to be too large to fit in the current idle resources.
>>
>> Configure it to use the backfill scheduler: SchedulerType=sched/backfill
>>
>>   SchedulerType
>>   Identifies  the type of scheduler to be used.  Note the
>> slurmctld daemon must be restarted for a change in scheduler type to become
>> effective (reconfiguring a running daemon has no effect for this
>> parameter).  The scontrol command can be used to manually change job
>> priorities if desired.  Acceptable values include:
>>
>>   sched/backfill
>>  For a backfill scheduling module to augment the
>> default FIFO scheduling.  Backfill scheduling will initiate lower-priority
>> jobs if doing so does not delay the expected initiation time of any
>> higher  priority  job.   Effectiveness  of  backfill scheduling is
>> dependent upon users specifying job time limits, otherwise all jobs will
>> have the same time limit and backfilling is impossible.  Note documentation
>> for the SchedulerParameters option above.  This is the default
>> configuration.
>>
>>   sched/builtin
>>  This  is  the  FIFO scheduler which initiates jobs
>> in priority order.  If any job in the partition can not be scheduled, no
>> lower priority job in that partition will be scheduled.  An exception is
>> made for jobs that can not run due to partition constraints (e.g. the time
>> limit) or down/drained nodes.  In that case, lower priority jobs can be
>> initiated and not impact the higher priority job.
>>
>>
>>
>> Your partitions are set with maxtime=INFINITE, if your users are not
>> specifying a reasonable timelimit to their jobs, this won't help either.
>>
>>
>> -b
>>
>>
>> On 4/24/20 1:52 PM, navin srivastava wrote:
>>
>> In addition to the above when i see the sprio of both the jobs it says :-
>>
>> for normal queue jobs all jobs showing the same priority
>>
>>  JOBID PARTITION   PRIORITY  FAIRSHARE
>> 1291352 normal   15789  15789
>>
>> for GPUsmall all jobs showing the same priority.
>>
>>  JOBID PARTITION   PRIORITY  FAIRSHARE
>> 1291339 GPUsmall  21052  21053
>>
>> On Fri, Apr 24, 2020 at 11:14 PM navin srivastava 
>> wrote:
>>
>>> Hi Team,
>>>
>>> we are facing some issue in our environment. The resources are free but
>>> job is going into the QUEUE state but not running.
>>>
>>> i have attached the slurm.conf file here.
>>>
>>> scenario:-
>>>
>>> There are job only in the 2 partitions:
>>>  344 jobs are in PD state in normal partition and the node belongs
>>> from the normal partitions are full and no more job can run.
>>>
>>> 1300 JOBS are in GPUsmall partition are in queue and enough CPU is
>>> avaiable to execute the jobs but i see the jobs are not