HI Reuti I'm implementing only a share-tree. The docs somewhere state something along the lines of use one or the other. I've seen the man page as It explains most of the math but leaves out some key elements. For example, how are "tickets" handed out and in what quantity (i.e. why do some job get 20000 tickets based on my configuration below). Also, the normalization function puts the values between 0 and 1 but based on what? Number of tickets issued to the job divided by the total?
Thanks for your help. Iyad Kandalaft -----Original Message----- From: Reuti <re...@staff.uni-marburg.de> Sent: Wednesday, February 27, 2019 4:00 PM To: Kandalaft, Iyad (AAFC/AAC) <iyad.kandal...@canada.ca> Cc: users@gridengine.org Subject: Re: [gridengine users] Fair share policy Hi, there is a man page "man sge_priority". Which policy do you intend to use: share-tree (honors past usage) or functional (current use), or both? -- Reuti > Am 25.02.2019 um 15:03 schrieb Kandalaft, Iyad (AAFC/AAC) > <iyad.kandal...@canada.ca>: > > Hi all, > > I recently implemented a fair share policy using share tickets. I’ve been > monitoring the cluster for a couple of days using qstat -pri -ext -u “*” in > order to see how the functional tickets are working and it seems to have the > intended effect. There are some anomalies where some running jobs have 0 > tickets but still get scheduled since there’s free resources; I assume this > is normal. > > I’ll admit that I don’t fully understand the scheduling as it’s somewhat > complex. So, I’m hoping someone can review the configuration to see if they > can find any glaring issues such as conflicting options. > > I created a share-tree and gave all users an equal value of 10: > $ qconf -sstree > id=0 > name=Root > type=0 > shares=1 > childnodes=1 > id=1 > name=default > type=0 > shares=10 > childnodes=NONE > > I modified the scheduling by setting the weight_tickets_share to 1000000. I > reduced the weight_waiting_time weight_priority weight_urgency to well below > the weight_ticket (what are good values?). > $ qconf -ssconf > algorithm default > schedule_interval 0:0:15 > maxujobs 0 > queue_sort_method seqno > job_load_adjustments np_load_avg=0.50 > load_adjustment_decay_time 0:7:30 > load_formula np_load_avg > schedd_job_info false > flush_submit_sec 0 > flush_finish_sec 0 > params none > reprioritize_interval 0:0:0 > halftime 168 > usage_weight_list cpu=0.700000,mem=0.200000,io=0.100000 > compensation_factor 5.000000 > weight_user 0.250000 > weight_project 0.250000 > weight_department 0.250000 > weight_job 0.250000 > weight_tickets_functional 0 > weight_tickets_share 1000000 > share_override_tickets TRUE > share_functional_shares TRUE > max_functional_jobs_to_schedule 200 > report_pjob_tickets TRUE > max_pending_tasks_per_job 50 > halflife_decay_list none > policy_hierarchy OFS > weight_ticket 0.500000 > weight_waiting_time 0.000010 > weight_deadline 3600000.000000 > weight_urgency 0.010000 > weight_priority 0.010000 > max_reservation 0 > default_duration INFINITY > > I modified all the users to set the fshare to 1000 $ qconf -muser XXX > > I modified the general conf to auto_user_fsahre 1000 and > auto_user_delete_time 7776000 (90 days). Halftime is set to the default 7 > days (I assume I should increase this). I don’t know if > auto_user_delete_time even matters. > $ qconf -sconf > #global: > execd_spool_dir /opt/gridengine/default/spool > mailer /opt/gridengine/default/commond/mail_wrapper.py > xterm /usr/bin/xterm > load_sensor none > prolog none > epilog none > shell_start_mode posix_compliant > login_shells sh,bash > min_uid 100 > min_gid 100 > user_lists none > xuser_lists none > projects none > xprojects none > enforce_project false > enforce_user auto > load_report_time 00:00:40 > max_unheard 00:05:00 > reschedule_unknown 00:00:00 > loglevel log_info > administrator_mail none > set_token_cmd none > pag_cmd none > token_extend_time none > shepherd_cmd none > qmaster_params none > execd_params ENABLE_BINDING=true ENABLE_ADDGRP_KILL=true \ > H_DESCRIPTORS=16K > reporting_params accounting=true reporting=true \ > flush_time=00:00:15 joblog=true sharelog=00:00:00 > finished_jobs 100 > gid_range 20000-20100 > qlogin_command /opt/gridengine/bin/rocks-qlogin.sh > qlogin_daemon /usr/sbin/sshd -i > rlogin_command builtin > rlogin_daemon builtin > rsh_command builtin > rsh_daemon builtin > max_aj_instances 2000 > max_aj_tasks 75000 > max_u_jobs 0 > max_jobs 0 > max_advance_reservations 0 > auto_user_oticket 0 > auto_user_fshare 1000 > auto_user_default_project none > auto_user_delete_time 7776000 > delegated_file_staging false > reprioritize 0 > jsv_url none > jsv_allowed_mod ac,h,i,e,o,j,M,N,p,w > > Thanks for your assistance. > > Cheers > > Iyad Kandalaft > > > _______________________________________________ > users mailing list > users@gridengine.org > https://gridengine.org/mailman/listinfo/users _______________________________________________ users mailing list users@gridengine.org https://gridengine.org/mailman/listinfo/users