Re: [gridengine users] Fair share policy

2019-02-28 Thread Mark Dixon
Hi Bill,

Yes, I think so!

https://arc.liv.ac.uk/trac/SGE/ticket/1550

Best,

Mark

On Thu, 28 Feb 2019, William Bryce wrote:

> Yup there was one we saw when you have a share tree and turn off the past
> usage (basically don't consider past usage) and have both parallel and
> serial jobs in the system.  the share tree ended up unbalanced in favour of
> the parallel jobs.  Did you guys fix that one too?
>
> Regards,
>
> Bill.
>
>
> On Thu, Feb 28, 2019 at 7:54 AM William Bryce  wrote:
>
>> Didn’t know that Mark. That is great. I remember there was more than one
>> issue with Sharetree and arrays that we saw but it didn’t happen in the
>> default sharetree configuration. I will have to check.
>>
>> Regards
>>
>> Bill
>>
>> Sent from my iPhone
>>
>>> On Feb 28, 2019, at 4:32 AM, Mark Dixon  wrote:
>>>
>>> Hi Bill,
>>>
>>> I fixed that share-tree-array-jobs priority problem some time ago,
>> unless
>>> you're thinking of a different one?
>>>
>>> https://arc.liv.ac.uk/trac/SGE/ticket/435
>>> https://arc.liv.ac.uk/trac/SGE/changeset/4840/sge
>>>
>>> We use share tree and array jobs all the time with no problems. It made
>> it
>>> into a Son of Gridengine release.
>>>
>>> Best,
>>>
>>> Mark
>>>
>>>> On Wed, 27 Feb 2019, William Bryce wrote:
>>>>
>>>> Hi Iyad,
>>>>
>>>> Reuti is correct the man sge_priority explains how sge calculates the
>>>> priority of jobs.  It includes the formula.  I will say that if you
>> intend
>>>> to use share-tree with Array Jobs (i.e. qsub -t) then you will find out
>>>> that the priority calculation is 'wrong' because it does not properly
>>>> account for array jobs.  The functional share tree policy does not have
>>>> this issue - just the share tree policy.
>>>>
>>>> Regards,
>>>>
>>>> Bill.
>>>>
>>>>
>>>> On Wed, Feb 27, 2019 at 4:10 PM Kandalaft, Iyad (AAFC/AAC) <
>>>> iyad.kandal...@canada.ca> wrote:
>>>>
>>>>> HI Reuti
>>>>>
>>>>> I'm implementing only a share-tree.  The docs somewhere state something
>>>>> along the lines of use one or the other.
>>>>> I've seen the man page as  It explains most of the math but leaves out
>>>>> some key elements.  For example, how are "tickets" handed out and in
>> what
>>>>> quantity (i.e. why do some job get 2 tickets based on my
>> configuration
>>>>> below).  Also, the normalization function puts the values between 0
>> and 1
>>>>> but based on what?  Number of tickets issued to the job divided by the
>>>>> total?
>>>>>
>>>>> Thanks for your help.
>>>>>
>>>>> Iyad Kandalaft
>>>>>
>>>>> -Original Message-
>>>>> From: Reuti 
>>>>> Sent: Wednesday, February 27, 2019 4:00 PM
>>>>> To: Kandalaft, Iyad (AAFC/AAC) 
>>>>> Cc: users@gridengine.org
>>>>> Subject: Re: [gridengine users] Fair share policy
>>>>>
>>>>> Hi,
>>>>>
>>>>> there is a man page "man sge_priority". Which policy do you intend to
>> use:
>>>>> share-tree (honors past usage) or functional (current use), or both?
>>>>>
>>>>> -- Reuti
>>>>>
>>>>>
>>>>>> Am 25.02.2019 um 15:03 schrieb Kandalaft, Iyad (AAFC/AAC) <
>>>>> iyad.kandal...@canada.ca>:
>>>>>>
>>>>>> Hi all,
>>>>>>
>>>>>> I recently implemented a fair share policy using share tickets.  I’ve
>>>>> been monitoring the cluster for a couple of days using qstat -pri -ext
>> -u
>>>>> “*” in order to see how the functional tickets are working and it
>> seems to
>>>>> have the intended effect.  There are some anomalies where some running
>> jobs
>>>>> have 0 tickets but still get scheduled since there’s free resources; I
>>>>> assume this is normal.
>>>>>>
>>>>>> I’ll admit that I don’t fully understand the scheduling as it’s
>> somewhat
>>>>> complex.  So, I’m hoping someone can review the configuration to see if
>>>>> they can find any glaring issues such 

Re: [gridengine users] Fair share policy

2019-02-28 Thread William Bryce
Didn’t know that Mark. That is great. I remember there was more than one issue 
with Sharetree and arrays that we saw but it didn’t happen in the default 
sharetree configuration. I will have to check. 

Regards 

Bill

Sent from my iPhone

> On Feb 28, 2019, at 4:32 AM, Mark Dixon  wrote:
> 
> Hi Bill,
> 
> I fixed that share-tree-array-jobs priority problem some time ago, unless 
> you're thinking of a different one?
> 
> https://arc.liv.ac.uk/trac/SGE/ticket/435
> https://arc.liv.ac.uk/trac/SGE/changeset/4840/sge
> 
> We use share tree and array jobs all the time with no problems. It made it 
> into a Son of Gridengine release.
> 
> Best,
> 
> Mark
> 
>> On Wed, 27 Feb 2019, William Bryce wrote:
>> 
>> Hi Iyad,
>> 
>> Reuti is correct the man sge_priority explains how sge calculates the
>> priority of jobs.  It includes the formula.  I will say that if you intend
>> to use share-tree with Array Jobs (i.e. qsub -t) then you will find out
>> that the priority calculation is 'wrong' because it does not properly
>> account for array jobs.  The functional share tree policy does not have
>> this issue - just the share tree policy.
>> 
>> Regards,
>> 
>> Bill.
>> 
>> 
>> On Wed, Feb 27, 2019 at 4:10 PM Kandalaft, Iyad (AAFC/AAC) <
>> iyad.kandal...@canada.ca> wrote:
>> 
>>> HI Reuti
>>> 
>>> I'm implementing only a share-tree.  The docs somewhere state something
>>> along the lines of use one or the other.
>>> I've seen the man page as  It explains most of the math but leaves out
>>> some key elements.  For example, how are "tickets" handed out and in what
>>> quantity (i.e. why do some job get 2 tickets based on my configuration
>>> below).  Also, the normalization function puts the values between 0 and 1
>>> but based on what?  Number of tickets issued to the job divided by the
>>> total?
>>> 
>>> Thanks for your help.
>>> 
>>> Iyad Kandalaft
>>> 
>>> -Original Message-
>>> From: Reuti 
>>> Sent: Wednesday, February 27, 2019 4:00 PM
>>> To: Kandalaft, Iyad (AAFC/AAC) 
>>> Cc: users@gridengine.org
>>> Subject: Re: [gridengine users] Fair share policy
>>> 
>>> Hi,
>>> 
>>> there is a man page "man sge_priority". Which policy do you intend to use:
>>> share-tree (honors past usage) or functional (current use), or both?
>>> 
>>> -- Reuti
>>> 
>>> 
>>>> Am 25.02.2019 um 15:03 schrieb Kandalaft, Iyad (AAFC/AAC) <
>>> iyad.kandal...@canada.ca>:
>>>> 
>>>> Hi all,
>>>> 
>>>> I recently implemented a fair share policy using share tickets.  I’ve
>>> been monitoring the cluster for a couple of days using qstat -pri -ext -u
>>> “*” in order to see how the functional tickets are working and it seems to
>>> have the intended effect.  There are some anomalies where some running jobs
>>> have 0 tickets but still get scheduled since there’s free resources; I
>>> assume this is normal.
>>>> 
>>>> I’ll admit that I don’t fully understand the scheduling as it’s somewhat
>>> complex.  So, I’m hoping someone can review the configuration to see if
>>> they can find any glaring issues such as conflicting options.
>>>> 
>>>> I created a share-tree and gave all users an equal value of 10:
>>>> $ qconf -sstree
>>>> id=0
>>>> name=Root
>>>> type=0
>>>> shares=1
>>>> childnodes=1
>>>> id=1
>>>> name=default
>>>> type=0
>>>> shares=10
>>>> childnodes=NONE
>>>> 
>>>> I modified the scheduling by setting the weight_tickets_share to
>>> 100. I reduced the weight_waiting_time weight_priority weight_urgency
>>> to well below the weight_ticket (what are good values?).
>>>> $ qconf -ssconf
>>>> algorithm default
>>>> schedule_interval 0:0:15
>>>> maxujobs  0
>>>> queue_sort_method seqno
>>>> job_load_adjustments  np_load_avg=0.50
>>>> load_adjustment_decay_time0:7:30
>>>> load_formula  np_load_avg
>>>> schedd_job_info   false
>>>> flush_submit_sec  0
>>>> flush_finish_sec  0
>>>> params  

Re: [gridengine users] Fair share policy

2019-02-28 Thread Mark Dixon
Hi Bill,

I fixed that share-tree-array-jobs priority problem some time ago, unless 
you're thinking of a different one?

https://arc.liv.ac.uk/trac/SGE/ticket/435
https://arc.liv.ac.uk/trac/SGE/changeset/4840/sge

We use share tree and array jobs all the time with no problems. It made it 
into a Son of Gridengine release.

Best,

Mark

On Wed, 27 Feb 2019, William Bryce wrote:

> Hi Iyad,
>
> Reuti is correct the man sge_priority explains how sge calculates the
> priority of jobs.  It includes the formula.  I will say that if you intend
> to use share-tree with Array Jobs (i.e. qsub -t) then you will find out
> that the priority calculation is 'wrong' because it does not properly
> account for array jobs.  The functional share tree policy does not have
> this issue - just the share tree policy.
>
> Regards,
>
> Bill.
>
>
> On Wed, Feb 27, 2019 at 4:10 PM Kandalaft, Iyad (AAFC/AAC) <
> iyad.kandal...@canada.ca> wrote:
>
>> HI Reuti
>>
>> I'm implementing only a share-tree.  The docs somewhere state something
>> along the lines of use one or the other.
>> I've seen the man page as  It explains most of the math but leaves out
>> some key elements.  For example, how are "tickets" handed out and in what
>> quantity (i.e. why do some job get 2 tickets based on my configuration
>> below).  Also, the normalization function puts the values between 0 and 1
>> but based on what?  Number of tickets issued to the job divided by the
>> total?
>>
>> Thanks for your help.
>>
>> Iyad Kandalaft
>>
>> -Original Message-----
>> From: Reuti 
>> Sent: Wednesday, February 27, 2019 4:00 PM
>> To: Kandalaft, Iyad (AAFC/AAC) 
>> Cc: users@gridengine.org
>> Subject: Re: [gridengine users] Fair share policy
>>
>> Hi,
>>
>> there is a man page "man sge_priority". Which policy do you intend to use:
>> share-tree (honors past usage) or functional (current use), or both?
>>
>> -- Reuti
>>
>>
>>> Am 25.02.2019 um 15:03 schrieb Kandalaft, Iyad (AAFC/AAC) <
>> iyad.kandal...@canada.ca>:
>>>
>>> Hi all,
>>>
>>> I recently implemented a fair share policy using share tickets.  I’ve
>> been monitoring the cluster for a couple of days using qstat -pri -ext -u
>> “*” in order to see how the functional tickets are working and it seems to
>> have the intended effect.  There are some anomalies where some running jobs
>> have 0 tickets but still get scheduled since there’s free resources; I
>> assume this is normal.
>>>
>>> I’ll admit that I don’t fully understand the scheduling as it’s somewhat
>> complex.  So, I’m hoping someone can review the configuration to see if
>> they can find any glaring issues such as conflicting options.
>>>
>>> I created a share-tree and gave all users an equal value of 10:
>>> $ qconf -sstree
>>> id=0
>>> name=Root
>>> type=0
>>> shares=1
>>> childnodes=1
>>> id=1
>>> name=default
>>> type=0
>>> shares=10
>>> childnodes=NONE
>>>
>>> I modified the scheduling by setting the weight_tickets_share to
>> 100. I reduced the weight_waiting_time weight_priority weight_urgency
>> to well below the weight_ticket (what are good values?).
>>> $ qconf -ssconf
>>> algorithm default
>>> schedule_interval 0:0:15
>>> maxujobs  0
>>> queue_sort_method seqno
>>> job_load_adjustments  np_load_avg=0.50
>>> load_adjustment_decay_time0:7:30
>>> load_formula  np_load_avg
>>> schedd_job_info   false
>>> flush_submit_sec  0
>>> flush_finish_sec  0
>>> paramsnone
>>> reprioritize_interval 0:0:0
>>> halftime  168
>>> usage_weight_list cpu=0.70,mem=0.20,io=0.10
>>> compensation_factor   5.00
>>> weight_user   0.25
>>> weight_project0.25
>>> weight_department 0.25
>>> weight_job0.25
>>> weight_tickets_functional 0
>>> weight_tickets_share  100
>>> share_override_ticketsTRUE
>>> share_functional_shares   TRUE
>>> max_functional_jobs_to_schedule   200
>>> report_pjob_tic

Re: [gridengine users] Fair share policy

2019-02-27 Thread Reuti
Hi,

> Am 27.02.2019 um 22:07 schrieb Kandalaft, Iyad (AAFC/AAC) 
> :
> 
> HI Reuti
> 
> I'm implementing only a share-tree.

Then you can set:

policy_hierarchy  S

The past usage is stored in the user object, hence auto_user_delete_time  
should be zero (and also in all the entries which were already created the 
delete_time should be zero: qconf -suserl) The fshare value set thererein 
shouldn't be honored in case you set up only the share_tree policy.

-- Reuti


>  The docs somewhere state something along the lines of use one or the other.
> I've seen the man page as  It explains most of the math but leaves out some 
> key elements.  For example, how are "tickets" handed out and in what quantity 
> (i.e. why do some job get 2 tickets based on my configuration below).  
> Also, the normalization function puts the values between 0 and 1 but based on 
> what?
>  Number of tickets issued to the job divided by the total?
> 
> Thanks for your help.
> 
> Iyad Kandalaft
> 
> -Original Message-
> From: Reuti  
> Sent: Wednesday, February 27, 2019 4:00 PM
> To: Kandalaft, Iyad (AAFC/AAC) 
> Cc: users@gridengine.org
> Subject: Re: [gridengine users] Fair share policy
> 
> Hi,
> 
> there is a man page "man sge_priority". Which policy do you intend to use: 
> share-tree (honors past usage) or functional (current use), or both?
> 
> -- Reuti
> 
> 
>> Am 25.02.2019 um 15:03 schrieb Kandalaft, Iyad (AAFC/AAC) 
>> :
>> 
>> Hi all,
>> 
>> I recently implemented a fair share policy using share tickets.  I’ve been 
>> monitoring the cluster for a couple of days using qstat -pri -ext -u “*” in 
>> order to see how the functional tickets are working and it seems to have the 
>> intended effect.  There are some anomalies where some running jobs have 0 
>> tickets but still get scheduled since there’s free resources; I assume this 
>> is normal.
>> 
>> I’ll admit that I don’t fully understand the scheduling as it’s somewhat 
>> complex.  So, I’m hoping someone can review the configuration to see if they 
>> can find any glaring issues such as conflicting options.
>> 
>> I created a share-tree and gave all users an equal value of 10:
>> $ qconf -sstree
>> id=0
>> name=Root
>> type=0
>> shares=1
>> childnodes=1
>> id=1
>> name=default
>> type=0
>> shares=10
>> childnodes=NONE
>> 
>> I modified the scheduling by setting the weight_tickets_share to 100. I 
>> reduced the weight_waiting_time weight_priority weight_urgency to well below 
>> the weight_ticket (what are good values?).
>> $ qconf -ssconf
>> algorithm default
>> schedule_interval 0:0:15
>> maxujobs  0
>> queue_sort_method seqno
>> job_load_adjustments  np_load_avg=0.50
>> load_adjustment_decay_time0:7:30
>> load_formula  np_load_avg
>> schedd_job_info   false
>> flush_submit_sec  0
>> flush_finish_sec  0
>> paramsnone
>> reprioritize_interval 0:0:0
>> halftime  168
>> usage_weight_list cpu=0.70,mem=0.20,io=0.10
>> compensation_factor   5.00
>> weight_user   0.25
>> weight_project0.25
>> weight_department 0.25
>> weight_job0.25
>> weight_tickets_functional 0
>> weight_tickets_share  100
>> share_override_ticketsTRUE
>> share_functional_shares   TRUE
>> max_functional_jobs_to_schedule   200
>> report_pjob_tickets   TRUE
>> max_pending_tasks_per_job 50
>> halflife_decay_list   none
>> policy_hierarchy  OFS
>> weight_ticket 0.50
>> weight_waiting_time   0.10
>> weight_deadline   360.00
>> weight_urgency0.01
>> weight_priority   0.01
>> max_reservation   0
>> default_duration  INFINITY
>> 
>> I modified all the users to set the fshare to 1000 $ qconf -muser XXX
>> 
>> I modified the general conf to auto_user_fsahre 1000 and 
>> auto_user_delete_time 7776000 (90 days).  Halftime is set to the default 7 
>> days (I assume I should increase this).  I don’t know if 
>> auto_use

Re: [gridengine users] Fair share policy

2019-02-27 Thread William Bryce
Hi Iyad,

Reuti is correct the man sge_priority explains how sge calculates the
priority of jobs.  It includes the formula.  I will say that if you intend
to use share-tree with Array Jobs (i.e. qsub -t) then you will find out
that the priority calculation is 'wrong' because it does not properly
account for array jobs.  The functional share tree policy does not have
this issue - just the share tree policy.

Regards,

Bill.


On Wed, Feb 27, 2019 at 4:10 PM Kandalaft, Iyad (AAFC/AAC) <
iyad.kandal...@canada.ca> wrote:

> HI Reuti
>
> I'm implementing only a share-tree.  The docs somewhere state something
> along the lines of use one or the other.
> I've seen the man page as  It explains most of the math but leaves out
> some key elements.  For example, how are "tickets" handed out and in what
> quantity (i.e. why do some job get 2 tickets based on my configuration
> below).  Also, the normalization function puts the values between 0 and 1
> but based on what?  Number of tickets issued to the job divided by the
> total?
>
> Thanks for your help.
>
> Iyad Kandalaft
>
> -Original Message-
> From: Reuti 
> Sent: Wednesday, February 27, 2019 4:00 PM
> To: Kandalaft, Iyad (AAFC/AAC) 
> Cc: users@gridengine.org
> Subject: Re: [gridengine users] Fair share policy
>
> Hi,
>
> there is a man page "man sge_priority". Which policy do you intend to use:
> share-tree (honors past usage) or functional (current use), or both?
>
> -- Reuti
>
>
> > Am 25.02.2019 um 15:03 schrieb Kandalaft, Iyad (AAFC/AAC) <
> iyad.kandal...@canada.ca>:
> >
> > Hi all,
> >
> > I recently implemented a fair share policy using share tickets.  I’ve
> been monitoring the cluster for a couple of days using qstat -pri -ext -u
> “*” in order to see how the functional tickets are working and it seems to
> have the intended effect.  There are some anomalies where some running jobs
> have 0 tickets but still get scheduled since there’s free resources; I
> assume this is normal.
> >
> > I’ll admit that I don’t fully understand the scheduling as it’s somewhat
> complex.  So, I’m hoping someone can review the configuration to see if
> they can find any glaring issues such as conflicting options.
> >
> > I created a share-tree and gave all users an equal value of 10:
> > $ qconf -sstree
> > id=0
> > name=Root
> > type=0
> > shares=1
> > childnodes=1
> > id=1
> > name=default
> > type=0
> > shares=10
> > childnodes=NONE
> >
> > I modified the scheduling by setting the weight_tickets_share to
> 100. I reduced the weight_waiting_time weight_priority weight_urgency
> to well below the weight_ticket (what are good values?).
> > $ qconf -ssconf
> > algorithm default
> > schedule_interval 0:0:15
> > maxujobs  0
> > queue_sort_method seqno
> > job_load_adjustments  np_load_avg=0.50
> > load_adjustment_decay_time0:7:30
> > load_formula  np_load_avg
> > schedd_job_info   false
> > flush_submit_sec  0
> > flush_finish_sec  0
> > paramsnone
> > reprioritize_interval 0:0:0
> > halftime  168
> > usage_weight_list cpu=0.70,mem=0.20,io=0.10
> > compensation_factor   5.00
> > weight_user   0.25
> > weight_project0.25
> > weight_department 0.25
> > weight_job0.25
> > weight_tickets_functional 0
> > weight_tickets_share  100
> > share_override_ticketsTRUE
> > share_functional_shares   TRUE
> > max_functional_jobs_to_schedule   200
> > report_pjob_tickets   TRUE
> > max_pending_tasks_per_job 50
> > halflife_decay_list   none
> > policy_hierarchy  OFS
> > weight_ticket 0.50
> > weight_waiting_time   0.10
> > weight_deadline   360.00
> > weight_urgency0.01
> > weight_priority   0.01
> > max_reservation   0
> > default_duration  INFINITY
> >
> > I modified all the users to set the fshare to 1000 $ qconf -muser XXX
> >
> > I modified the general conf to auto_user_fsahre 1000 and
> auto_user_delete_time 7776000 (90 days).  Halftime is set to the 

Re: [gridengine users] Fair share policy

2019-02-27 Thread Kandalaft, Iyad (AAFC/AAC)
HI Reuti

I'm implementing only a share-tree.  The docs somewhere state something along 
the lines of use one or the other.
I've seen the man page as  It explains most of the math but leaves out some key 
elements.  For example, how are "tickets" handed out and in what quantity (i.e. 
why do some job get 2 tickets based on my configuration below).  Also, the 
normalization function puts the values between 0 and 1 but based on what?  
Number of tickets issued to the job divided by the total?

Thanks for your help.

Iyad Kandalaft

-Original Message-
From: Reuti  
Sent: Wednesday, February 27, 2019 4:00 PM
To: Kandalaft, Iyad (AAFC/AAC) 
Cc: users@gridengine.org
Subject: Re: [gridengine users] Fair share policy

Hi,

there is a man page "man sge_priority". Which policy do you intend to use: 
share-tree (honors past usage) or functional (current use), or both?

-- Reuti


> Am 25.02.2019 um 15:03 schrieb Kandalaft, Iyad (AAFC/AAC) 
> :
> 
> Hi all,
>  
> I recently implemented a fair share policy using share tickets.  I’ve been 
> monitoring the cluster for a couple of days using qstat -pri -ext -u “*” in 
> order to see how the functional tickets are working and it seems to have the 
> intended effect.  There are some anomalies where some running jobs have 0 
> tickets but still get scheduled since there’s free resources; I assume this 
> is normal.
>  
> I’ll admit that I don’t fully understand the scheduling as it’s somewhat 
> complex.  So, I’m hoping someone can review the configuration to see if they 
> can find any glaring issues such as conflicting options.
>  
> I created a share-tree and gave all users an equal value of 10:
> $ qconf -sstree
> id=0
> name=Root
> type=0
> shares=1
> childnodes=1
> id=1
> name=default
> type=0
> shares=10
> childnodes=NONE
>  
> I modified the scheduling by setting the weight_tickets_share to 100. I 
> reduced the weight_waiting_time weight_priority weight_urgency to well below 
> the weight_ticket (what are good values?).
> $ qconf -ssconf
> algorithm default
> schedule_interval 0:0:15
> maxujobs  0
> queue_sort_method seqno
> job_load_adjustments  np_load_avg=0.50
> load_adjustment_decay_time0:7:30
> load_formula  np_load_avg
> schedd_job_info   false
> flush_submit_sec  0
> flush_finish_sec  0
> paramsnone
> reprioritize_interval 0:0:0
> halftime  168
> usage_weight_list cpu=0.70,mem=0.20,io=0.10
> compensation_factor   5.00
> weight_user   0.25
> weight_project0.25
> weight_department 0.25
> weight_job0.25
> weight_tickets_functional 0
> weight_tickets_share  100
> share_override_ticketsTRUE
> share_functional_shares   TRUE
> max_functional_jobs_to_schedule   200
> report_pjob_tickets   TRUE
> max_pending_tasks_per_job 50
> halflife_decay_list   none
> policy_hierarchy  OFS
> weight_ticket 0.50
> weight_waiting_time   0.10
> weight_deadline   360.00
> weight_urgency0.01
> weight_priority   0.01
> max_reservation   0
> default_duration  INFINITY
>  
> I modified all the users to set the fshare to 1000 $ qconf -muser XXX
>  
> I modified the general conf to auto_user_fsahre 1000 and 
> auto_user_delete_time 7776000 (90 days).  Halftime is set to the default 7 
> days (I assume I should increase this).  I don’t know if 
> auto_user_delete_time even matters.
> $ qconf -sconf
> #global:
> execd_spool_dir  /opt/gridengine/default/spool
> mailer   /opt/gridengine/default/commond/mail_wrapper.py
> xterm/usr/bin/xterm
> load_sensor  none
> prolog   none
> epilog   none
> shell_start_mode posix_compliant
> login_shells sh,bash
> min_uid  100
> min_gid  100
> user_lists   none
> xuser_lists  none
> projects none
> xprojectsnone
> enforce_project  false
> enforce_user auto
> load_report_time 00:00:40
> max_unheard  00:05:00
> reschedule_unknown   00:00:0

Re: [gridengine users] Fair share policy

2019-02-27 Thread Reuti
Hi,

there is a man page "man sge_priority". Which policy do you intend to use: 
share-tree (honors past usage) or functional (current use), or both?

-- Reuti


> Am 25.02.2019 um 15:03 schrieb Kandalaft, Iyad (AAFC/AAC) 
> :
> 
> Hi all,
>  
> I recently implemented a fair share policy using share tickets.  I’ve been 
> monitoring the cluster for a couple of days using qstat -pri -ext -u “*” in 
> order to see how the functional tickets are working and it seems to have the 
> intended effect.  There are some anomalies where some running jobs have 0 
> tickets but still get scheduled since there’s free resources; I assume this 
> is normal.
>  
> I’ll admit that I don’t fully understand the scheduling as it’s somewhat 
> complex.  So, I’m hoping someone can review the configuration to see if they 
> can find any glaring issues such as conflicting options.
>  
> I created a share-tree and gave all users an equal value of 10:
> $ qconf -sstree
> id=0
> name=Root
> type=0
> shares=1
> childnodes=1
> id=1
> name=default
> type=0
> shares=10
> childnodes=NONE
>  
> I modified the scheduling by setting the weight_tickets_share to 100. I 
> reduced the weight_waiting_time weight_priority weight_urgency to well below 
> the weight_ticket (what are good values?).
> $ qconf -ssconf
> algorithm default
> schedule_interval 0:0:15
> maxujobs  0
> queue_sort_method seqno
> job_load_adjustments  np_load_avg=0.50
> load_adjustment_decay_time0:7:30
> load_formula  np_load_avg
> schedd_job_info   false
> flush_submit_sec  0
> flush_finish_sec  0
> paramsnone
> reprioritize_interval 0:0:0
> halftime  168
> usage_weight_list cpu=0.70,mem=0.20,io=0.10
> compensation_factor   5.00
> weight_user   0.25
> weight_project0.25
> weight_department 0.25
> weight_job0.25
> weight_tickets_functional 0
> weight_tickets_share  100
> share_override_ticketsTRUE
> share_functional_shares   TRUE
> max_functional_jobs_to_schedule   200
> report_pjob_tickets   TRUE
> max_pending_tasks_per_job 50
> halflife_decay_list   none
> policy_hierarchy  OFS
> weight_ticket 0.50
> weight_waiting_time   0.10
> weight_deadline   360.00
> weight_urgency0.01
> weight_priority   0.01
> max_reservation   0
> default_duration  INFINITY
>  
> I modified all the users to set the fshare to 1000
> $ qconf -muser XXX
>  
> I modified the general conf to auto_user_fsahre 1000 and 
> auto_user_delete_time 7776000 (90 days).  Halftime is set to the default 7 
> days (I assume I should increase this).  I don’t know if 
> auto_user_delete_time even matters.
> $ qconf -sconf
> #global:
> execd_spool_dir  /opt/gridengine/default/spool
> mailer   /opt/gridengine/default/commond/mail_wrapper.py
> xterm/usr/bin/xterm
> load_sensor  none
> prolog   none
> epilog   none
> shell_start_mode posix_compliant
> login_shells sh,bash
> min_uid  100
> min_gid  100
> user_lists   none
> xuser_lists  none
> projects none
> xprojectsnone
> enforce_project  false
> enforce_user auto
> load_report_time 00:00:40
> max_unheard  00:05:00
> reschedule_unknown   00:00:00
> loglevel log_info
> administrator_mail   none
> set_token_cmdnone
> pag_cmd  none
> token_extend_timenone
> shepherd_cmd none
> qmaster_params   none
> execd_params ENABLE_BINDING=true ENABLE_ADDGRP_KILL=true \
>  H_DESCRIPTORS=16K
> reporting_params accounting=true reporting=true \
>  flush_time=00:00:15 joblog=true sharelog=00:00:00
> finished_jobs100
> gid_range2-20100
> qlogin_command   /opt/gridengine/bin/rocks-qlogin.sh
> qlogin_daemon/usr/sbin/sshd -i
> rlogin_command   builtin
> rlogin_daemonbuiltin
> rsh_command  builtin
> rsh_daemon   builtin
> max_aj_instances 2000
> max_aj_tasks 75000
> max_u_jobs   0
> max_jobs 0
> 

[gridengine users] Fair share policy

2019-02-25 Thread Kandalaft, Iyad (AAFC/AAC)
Hi all,

I recently implemented a fair share policy using share tickets.  I've been 
monitoring the cluster for a couple of days using qstat -pri -ext -u "*" in 
order to see how the functional tickets are working and it seems to have the 
intended effect.  There are some anomalies where some running jobs have 0 
tickets but still get scheduled since there's free resources; I assume this is 
normal.

I'll admit that I don't fully understand the scheduling as it's somewhat 
complex.  So, I'm hoping someone can review the configuration to see if they 
can find any glaring issues such as conflicting options.

I created a share-tree and gave all users an equal value of 10:
$ qconf -sstree
id=0
name=Root
type=0
shares=1
childnodes=1
id=1
name=default
type=0
shares=10
childnodes=NONE

I modified the scheduling by setting the weight_tickets_share to 100. I 
reduced the weight_waiting_time weight_priority weight_urgency to well below 
the weight_ticket (what are good values?).
$ qconf -ssconf
algorithm default
schedule_interval 0:0:15
maxujobs  0
queue_sort_method seqno
job_load_adjustments  np_load_avg=0.50
load_adjustment_decay_time0:7:30
load_formula  np_load_avg
schedd_job_info   false
flush_submit_sec  0
flush_finish_sec  0
paramsnone
reprioritize_interval 0:0:0
halftime  168
usage_weight_list cpu=0.70,mem=0.20,io=0.10
compensation_factor   5.00
weight_user   0.25
weight_project0.25
weight_department 0.25
weight_job0.25
weight_tickets_functional 0
weight_tickets_share  100
share_override_ticketsTRUE
share_functional_shares   TRUE
max_functional_jobs_to_schedule   200
report_pjob_tickets   TRUE
max_pending_tasks_per_job 50
halflife_decay_list   none
policy_hierarchy  OFS
weight_ticket 0.50
weight_waiting_time   0.10
weight_deadline   360.00
weight_urgency0.01
weight_priority   0.01
max_reservation   0
default_duration  INFINITY

I modified all the users to set the fshare to 1000
$ qconf -muser XXX

I modified the general conf to auto_user_fsahre 1000 and auto_user_delete_time 
7776000 (90 days).  Halftime is set to the default 7 days (I assume I should 
increase this).  I don't know if auto_user_delete_time even matters.
$ qconf -sconf
#global:
execd_spool_dir  /opt/gridengine/default/spool
mailer   /opt/gridengine/default/commond/mail_wrapper.py
xterm/usr/bin/xterm
load_sensor  none
prolog   none
epilog   none
shell_start_mode posix_compliant
login_shells sh,bash
min_uid  100
min_gid  100
user_lists   none
xuser_lists  none
projects none
xprojectsnone
enforce_project  false
enforce_user auto
load_report_time 00:00:40
max_unheard  00:05:00
reschedule_unknown   00:00:00
loglevel log_info
administrator_mail   none
set_token_cmdnone
pag_cmd  none
token_extend_timenone
shepherd_cmd none
qmaster_params   none
execd_params ENABLE_BINDING=true ENABLE_ADDGRP_KILL=true \
 H_DESCRIPTORS=16K
reporting_params accounting=true reporting=true \
 flush_time=00:00:15 joblog=true sharelog=00:00:00
finished_jobs100
gid_range2-20100
qlogin_command   /opt/gridengine/bin/rocks-qlogin.sh
qlogin_daemon/usr/sbin/sshd -i
rlogin_command   builtin
rlogin_daemonbuiltin
rsh_command  builtin
rsh_daemon   builtin
max_aj_instances 2000
max_aj_tasks 75000
max_u_jobs   0
max_jobs 0
max_advance_reservations 0
auto_user_oticket0
auto_user_fshare 1000
auto_user_default_projectnone
auto_user_delete_time7776000
delegated_file_staging   false
reprioritize 0
jsv_url  none
jsv_allowed_mod  ac,h,i,e,o,j,M,N,p,w

Thanks for your assistance.

Cheers

Iyad Kandalaft


___
users mailing list
users@gridengine.org