Didn’t know that Mark. That is great. I remember there was more than one issue with Sharetree and arrays that we saw but it didn’t happen in the default sharetree configuration. I will have to check.
Regards Bill Sent from my iPhone > On Feb 28, 2019, at 4:32 AM, Mark Dixon <m.c.di...@leeds.ac.uk> wrote: > > Hi Bill, > > I fixed that share-tree-array-jobs priority problem some time ago, unless > you're thinking of a different one? > > https://arc.liv.ac.uk/trac/SGE/ticket/435 > https://arc.liv.ac.uk/trac/SGE/changeset/4840/sge > > We use share tree and array jobs all the time with no problems. It made it > into a Son of Gridengine release. > > Best, > > Mark > >> On Wed, 27 Feb 2019, William Bryce wrote: >> >> Hi Iyad, >> >> Reuti is correct the man sge_priority explains how sge calculates the >> priority of jobs. It includes the formula. I will say that if you intend >> to use share-tree with Array Jobs (i.e. qsub -t) then you will find out >> that the priority calculation is 'wrong' because it does not properly >> account for array jobs. The functional share tree policy does not have >> this issue - just the share tree policy. >> >> Regards, >> >> Bill. >> >> >> On Wed, Feb 27, 2019 at 4:10 PM Kandalaft, Iyad (AAFC/AAC) < >> iyad.kandal...@canada.ca> wrote: >> >>> HI Reuti >>> >>> I'm implementing only a share-tree. The docs somewhere state something >>> along the lines of use one or the other. >>> I've seen the man page as It explains most of the math but leaves out >>> some key elements. For example, how are "tickets" handed out and in what >>> quantity (i.e. why do some job get 20000 tickets based on my configuration >>> below). Also, the normalization function puts the values between 0 and 1 >>> but based on what? Number of tickets issued to the job divided by the >>> total? >>> >>> Thanks for your help. >>> >>> Iyad Kandalaft >>> >>> -----Original Message----- >>> From: Reuti <re...@staff.uni-marburg.de> >>> Sent: Wednesday, February 27, 2019 4:00 PM >>> To: Kandalaft, Iyad (AAFC/AAC) <iyad.kandal...@canada.ca> >>> Cc: users@gridengine.org >>> Subject: Re: [gridengine users] Fair share policy >>> >>> Hi, >>> >>> there is a man page "man sge_priority". Which policy do you intend to use: >>> share-tree (honors past usage) or functional (current use), or both? >>> >>> -- Reuti >>> >>> >>>> Am 25.02.2019 um 15:03 schrieb Kandalaft, Iyad (AAFC/AAC) < >>> iyad.kandal...@canada.ca>: >>>> >>>> Hi all, >>>> >>>> I recently implemented a fair share policy using share tickets. I’ve >>> been monitoring the cluster for a couple of days using qstat -pri -ext -u >>> “*” in order to see how the functional tickets are working and it seems to >>> have the intended effect. There are some anomalies where some running jobs >>> have 0 tickets but still get scheduled since there’s free resources; I >>> assume this is normal. >>>> >>>> I’ll admit that I don’t fully understand the scheduling as it’s somewhat >>> complex. So, I’m hoping someone can review the configuration to see if >>> they can find any glaring issues such as conflicting options. >>>> >>>> I created a share-tree and gave all users an equal value of 10: >>>> $ qconf -sstree >>>> id=0 >>>> name=Root >>>> type=0 >>>> shares=1 >>>> childnodes=1 >>>> id=1 >>>> name=default >>>> type=0 >>>> shares=10 >>>> childnodes=NONE >>>> >>>> I modified the scheduling by setting the weight_tickets_share to >>> 1000000. I reduced the weight_waiting_time weight_priority weight_urgency >>> to well below the weight_ticket (what are good values?). >>>> $ qconf -ssconf >>>> algorithm default >>>> schedule_interval 0:0:15 >>>> maxujobs 0 >>>> queue_sort_method seqno >>>> job_load_adjustments np_load_avg=0.50 >>>> load_adjustment_decay_time 0:7:30 >>>> load_formula np_load_avg >>>> schedd_job_info false >>>> flush_submit_sec 0 >>>> flush_finish_sec 0 >>>> params none >>>> reprioritize_interval 0:0:0 >>>> halftime 168 >>>> usage_weight_list cpu=0.700000,mem=0.200000,io=0.100000 >>>> compensation_factor 5.000000 >>>> weight_user 0.250000 >>>> weight_project 0.250000 >>>> weight_department 0.250000 >>>> weight_job 0.250000 >>>> weight_tickets_functional 0 >>>> weight_tickets_share 1000000 >>>> share_override_tickets TRUE >>>> share_functional_shares TRUE >>>> max_functional_jobs_to_schedule 200 >>>> report_pjob_tickets TRUE >>>> max_pending_tasks_per_job 50 >>>> halflife_decay_list none >>>> policy_hierarchy OFS >>>> weight_ticket 0.500000 >>>> weight_waiting_time 0.000010 >>>> weight_deadline 3600000.000000 >>>> weight_urgency 0.010000 >>>> weight_priority 0.010000 >>>> max_reservation 0 >>>> default_duration INFINITY >>>> >>>> I modified all the users to set the fshare to 1000 $ qconf -muser XXX >>>> >>>> I modified the general conf to auto_user_fsahre 1000 and >>> auto_user_delete_time 7776000 (90 days). Halftime is set to the default 7 >>> days (I assume I should increase this). I don’t know if >>> auto_user_delete_time even matters. >>>> $ qconf -sconf >>>> #global: >>>> execd_spool_dir /opt/gridengine/default/spool >>>> mailer >>> /opt/gridengine/default/commond/mail_wrapper.py >>>> xterm /usr/bin/xterm >>>> load_sensor none >>>> prolog none >>>> epilog none >>>> shell_start_mode posix_compliant >>>> login_shells sh,bash >>>> min_uid 100 >>>> min_gid 100 >>>> user_lists none >>>> xuser_lists none >>>> projects none >>>> xprojects none >>>> enforce_project false >>>> enforce_user auto >>>> load_report_time 00:00:40 >>>> max_unheard 00:05:00 >>>> reschedule_unknown 00:00:00 >>>> loglevel log_info >>>> administrator_mail none >>>> set_token_cmd none >>>> pag_cmd none >>>> token_extend_time none >>>> shepherd_cmd none >>>> qmaster_params none >>>> execd_params ENABLE_BINDING=true ENABLE_ADDGRP_KILL=true >>> \ >>>> H_DESCRIPTORS=16K >>>> reporting_params accounting=true reporting=true \ >>>> flush_time=00:00:15 joblog=true >>> sharelog=00:00:00 >>>> finished_jobs 100 >>>> gid_range 20000-20100 >>>> qlogin_command /opt/gridengine/bin/rocks-qlogin.sh >>>> qlogin_daemon /usr/sbin/sshd -i >>>> rlogin_command builtin >>>> rlogin_daemon builtin >>>> rsh_command builtin >>>> rsh_daemon builtin >>>> max_aj_instances 2000 >>>> max_aj_tasks 75000 >>>> max_u_jobs 0 >>>> max_jobs 0 >>>> max_advance_reservations 0 >>>> auto_user_oticket 0 >>>> auto_user_fshare 1000 >>>> auto_user_default_project none >>>> auto_user_delete_time 7776000 >>>> delegated_file_staging false >>>> reprioritize 0 >>>> jsv_url none >>>> jsv_allowed_mod ac,h,i,e,o,j,M,N,p,w >>>> >>>> Thanks for your assistance. >>>> >>>> Cheers >>>> >>>> Iyad Kandalaft >>>> >>>> >>>> _______________________________________________ >>>> users mailing list >>>> users@gridengine.org >>>> https://gridengine.org/mailman/listinfo/users >>> >>> >>> _______________________________________________ >>> users mailing list >>> users@gridengine.org >>> https://gridengine.org/mailman/listinfo/users >>> >> >> >> -- >> *William Bryce* | VP of Products >> Univa Corporation <http://www.univa.com/> - 130 Esna Park Drive, Second >> Floor, Markham, Ontario, Canada >> *Email* bbr...@univa.com | *Mobile: 647.974.2841* | *Office: 647.478.5974* _______________________________________________ users mailing list users@gridengine.org https://gridengine.org/mailman/listinfo/users