Hi Bill, Yes, I think so!
https://arc.liv.ac.uk/trac/SGE/ticket/1550 Best, Mark On Thu, 28 Feb 2019, William Bryce wrote: > Yup there was one we saw when you have a share tree and turn off the past > usage (basically don't consider past usage) and have both parallel and > serial jobs in the system. the share tree ended up unbalanced in favour of > the parallel jobs. Did you guys fix that one too? > > Regards, > > Bill. > > > On Thu, Feb 28, 2019 at 7:54 AM William Bryce <bbr...@univa.com> wrote: > >> Didn’t know that Mark. That is great. I remember there was more than one >> issue with Sharetree and arrays that we saw but it didn’t happen in the >> default sharetree configuration. I will have to check. >> >> Regards >> >> Bill >> >> Sent from my iPhone >> >>> On Feb 28, 2019, at 4:32 AM, Mark Dixon <m.c.di...@leeds.ac.uk> wrote: >>> >>> Hi Bill, >>> >>> I fixed that share-tree-array-jobs priority problem some time ago, >> unless >>> you're thinking of a different one? >>> >>> https://arc.liv.ac.uk/trac/SGE/ticket/435 >>> https://arc.liv.ac.uk/trac/SGE/changeset/4840/sge >>> >>> We use share tree and array jobs all the time with no problems. It made >> it >>> into a Son of Gridengine release. >>> >>> Best, >>> >>> Mark >>> >>>> On Wed, 27 Feb 2019, William Bryce wrote: >>>> >>>> Hi Iyad, >>>> >>>> Reuti is correct the man sge_priority explains how sge calculates the >>>> priority of jobs. It includes the formula. I will say that if you >> intend >>>> to use share-tree with Array Jobs (i.e. qsub -t) then you will find out >>>> that the priority calculation is 'wrong' because it does not properly >>>> account for array jobs. The functional share tree policy does not have >>>> this issue - just the share tree policy. >>>> >>>> Regards, >>>> >>>> Bill. >>>> >>>> >>>> On Wed, Feb 27, 2019 at 4:10 PM Kandalaft, Iyad (AAFC/AAC) < >>>> iyad.kandal...@canada.ca> wrote: >>>> >>>>> HI Reuti >>>>> >>>>> I'm implementing only a share-tree. The docs somewhere state something >>>>> along the lines of use one or the other. >>>>> I've seen the man page as It explains most of the math but leaves out >>>>> some key elements. For example, how are "tickets" handed out and in >> what >>>>> quantity (i.e. why do some job get 20000 tickets based on my >> configuration >>>>> below). Also, the normalization function puts the values between 0 >> and 1 >>>>> but based on what? Number of tickets issued to the job divided by the >>>>> total? >>>>> >>>>> Thanks for your help. >>>>> >>>>> Iyad Kandalaft >>>>> >>>>> -----Original Message----- >>>>> From: Reuti <re...@staff.uni-marburg.de> >>>>> Sent: Wednesday, February 27, 2019 4:00 PM >>>>> To: Kandalaft, Iyad (AAFC/AAC) <iyad.kandal...@canada.ca> >>>>> Cc: users@gridengine.org >>>>> Subject: Re: [gridengine users] Fair share policy >>>>> >>>>> Hi, >>>>> >>>>> there is a man page "man sge_priority". Which policy do you intend to >> use: >>>>> share-tree (honors past usage) or functional (current use), or both? >>>>> >>>>> -- Reuti >>>>> >>>>> >>>>>> Am 25.02.2019 um 15:03 schrieb Kandalaft, Iyad (AAFC/AAC) < >>>>> iyad.kandal...@canada.ca>: >>>>>> >>>>>> Hi all, >>>>>> >>>>>> I recently implemented a fair share policy using share tickets. I’ve >>>>> been monitoring the cluster for a couple of days using qstat -pri -ext >> -u >>>>> “*” in order to see how the functional tickets are working and it >> seems to >>>>> have the intended effect. There are some anomalies where some running >> jobs >>>>> have 0 tickets but still get scheduled since there’s free resources; I >>>>> assume this is normal. >>>>>> >>>>>> I’ll admit that I don’t fully understand the scheduling as it’s >> somewhat >>>>> complex. So, I’m hoping someone can review the configuration to see if >>>>> they can find any glaring issues such as conflicting options. >>>>>> >>>>>> I created a share-tree and gave all users an equal value of 10: >>>>>> $ qconf -sstree >>>>>> id=0 >>>>>> name=Root >>>>>> type=0 >>>>>> shares=1 >>>>>> childnodes=1 >>>>>> id=1 >>>>>> name=default >>>>>> type=0 >>>>>> shares=10 >>>>>> childnodes=NONE >>>>>> >>>>>> I modified the scheduling by setting the weight_tickets_share to >>>>> 1000000. I reduced the weight_waiting_time weight_priority >> weight_urgency >>>>> to well below the weight_ticket (what are good values?). >>>>>> $ qconf -ssconf >>>>>> algorithm default >>>>>> schedule_interval 0:0:15 >>>>>> maxujobs 0 >>>>>> queue_sort_method seqno >>>>>> job_load_adjustments np_load_avg=0.50 >>>>>> load_adjustment_decay_time 0:7:30 >>>>>> load_formula np_load_avg >>>>>> schedd_job_info false >>>>>> flush_submit_sec 0 >>>>>> flush_finish_sec 0 >>>>>> params none >>>>>> reprioritize_interval 0:0:0 >>>>>> halftime 168 >>>>>> usage_weight_list >> cpu=0.700000,mem=0.200000,io=0.100000 >>>>>> compensation_factor 5.000000 >>>>>> weight_user 0.250000 >>>>>> weight_project 0.250000 >>>>>> weight_department 0.250000 >>>>>> weight_job 0.250000 >>>>>> weight_tickets_functional 0 >>>>>> weight_tickets_share 1000000 >>>>>> share_override_tickets TRUE >>>>>> share_functional_shares TRUE >>>>>> max_functional_jobs_to_schedule 200 >>>>>> report_pjob_tickets TRUE >>>>>> max_pending_tasks_per_job 50 >>>>>> halflife_decay_list none >>>>>> policy_hierarchy OFS >>>>>> weight_ticket 0.500000 >>>>>> weight_waiting_time 0.000010 >>>>>> weight_deadline 3600000.000000 >>>>>> weight_urgency 0.010000 >>>>>> weight_priority 0.010000 >>>>>> max_reservation 0 >>>>>> default_duration INFINITY >>>>>> >>>>>> I modified all the users to set the fshare to 1000 $ qconf -muser XXX >>>>>> >>>>>> I modified the general conf to auto_user_fsahre 1000 and >>>>> auto_user_delete_time 7776000 (90 days). Halftime is set to the >> default 7 >>>>> days (I assume I should increase this). I don’t know if >>>>> auto_user_delete_time even matters. >>>>>> $ qconf -sconf >>>>>> #global: >>>>>> execd_spool_dir /opt/gridengine/default/spool >>>>>> mailer >>>>> /opt/gridengine/default/commond/mail_wrapper.py >>>>>> xterm /usr/bin/xterm >>>>>> load_sensor none >>>>>> prolog none >>>>>> epilog none >>>>>> shell_start_mode posix_compliant >>>>>> login_shells sh,bash >>>>>> min_uid 100 >>>>>> min_gid 100 >>>>>> user_lists none >>>>>> xuser_lists none >>>>>> projects none >>>>>> xprojects none >>>>>> enforce_project false >>>>>> enforce_user auto >>>>>> load_report_time 00:00:40 >>>>>> max_unheard 00:05:00 >>>>>> reschedule_unknown 00:00:00 >>>>>> loglevel log_info >>>>>> administrator_mail none >>>>>> set_token_cmd none >>>>>> pag_cmd none >>>>>> token_extend_time none >>>>>> shepherd_cmd none >>>>>> qmaster_params none >>>>>> execd_params ENABLE_BINDING=true >> ENABLE_ADDGRP_KILL=true >>>>> \ >>>>>> H_DESCRIPTORS=16K >>>>>> reporting_params accounting=true reporting=true \ >>>>>> flush_time=00:00:15 joblog=true >>>>> sharelog=00:00:00 >>>>>> finished_jobs 100 >>>>>> gid_range 20000-20100 >>>>>> qlogin_command /opt/gridengine/bin/rocks-qlogin.sh >>>>>> qlogin_daemon /usr/sbin/sshd -i >>>>>> rlogin_command builtin >>>>>> rlogin_daemon builtin >>>>>> rsh_command builtin >>>>>> rsh_daemon builtin >>>>>> max_aj_instances 2000 >>>>>> max_aj_tasks 75000 >>>>>> max_u_jobs 0 >>>>>> max_jobs 0 >>>>>> max_advance_reservations 0 >>>>>> auto_user_oticket 0 >>>>>> auto_user_fshare 1000 >>>>>> auto_user_default_project none >>>>>> auto_user_delete_time 7776000 >>>>>> delegated_file_staging false >>>>>> reprioritize 0 >>>>>> jsv_url none >>>>>> jsv_allowed_mod ac,h,i,e,o,j,M,N,p,w >>>>>> >>>>>> Thanks for your assistance. >>>>>> >>>>>> Cheers >>>>>> >>>>>> Iyad Kandalaft >>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> users mailing list >>>>>> users@gridengine.org >>>>>> https://gridengine.org/mailman/listinfo/users >>>>> >>>>> >>>>> _______________________________________________ >>>>> users mailing list >>>>> users@gridengine.org >>>>> https://gridengine.org/mailman/listinfo/users >>>>> >>>> >>>> >>>> -- >>>> *William Bryce* | VP of Products >>>> Univa Corporation <http://www.univa.com/> - 130 Esna Park Drive, Second >>>> Floor, Markham, Ontario, Canada >>>> *Email* bbr...@univa.com | *Mobile: 647.974.2841* | *Office: >> 647.478.5974* >> > > > -- > *William Bryce* | VP of Products > Univa Corporation <http://www.univa.com/> - 130 Esna Park Drive, Second > Floor, Markham, Ontario, Canada > *Email* bbr...@univa.com | *Mobile: 647.974.2841* | *Office: 647.478.5974* > -- ------------------------------------------------------------------- Mark Dixon Email : m.c.di...@leeds.ac.uk Advanced Research Computing (ARC) Tel (int): 35429 IT Services building Tel (ext): +44(0)113 343 5429 University of Leeds, LS2 9JT, UK ------------------------------------------------------------------- _______________________________________________ users mailing list users@gridengine.org https://gridengine.org/mailman/listinfo/users