Am 09.05.2013 um 18:51 schrieb Chris Paciorek:

> We're having a problem similar to that described in this thread:
> http://www.mentby.com/Group/grid-engine/62u4-resource-reservation-not-working-for-some-jobs.html
> 
> We're running Grid Engine 6.2u5 for a cluster of 4 Linux nodes (32 cores 
> each) running Ubuntu 12.04 (Precise). 
> 
> We're seeing that jobs that request a reservation and are at the top of the 
> queue are not starting, with lower-priority jobs that are requesting fewer 
> cores slipping ahead of the higher priority job. An example of this is at the 
> bottom of this posting.

Besides the defined "default_duration 7200:00:00": what h_rt/s_rt request was 
supplied to the short jobs?

-- Reuti


> Here's the results of "qconf -ssconf":
> algorithm                         default
> schedule_interval                 0:0:15
> maxujobs                          0
> queue_sort_method                 load
> job_load_adjustments              np_load_avg=0.50
> load_adjustment_decay_time        0:7:30
> load_formula                      np_load_avg
> schedd_job_info                   true
> flush_submit_sec                  0
> flush_finish_sec                  0
> params                            MONITOR=1
> reprioritize_interval             0:0:0
> halftime                          720
> usage_weight_list                 cpu=1.000000,mem=0.000000,io=0.000000
> compensation_factor               5.000000
> weight_user                       0.250000
> weight_project                    0.250000
> weight_department                 0.250000
> weight_job                        0.250000
> weight_tickets_functional         0
> weight_tickets_share              100000
> share_override_tickets            TRUE
> share_functional_shares           TRUE
> max_functional_jobs_to_schedule   200
> report_pjob_tickets               TRUE
> max_pending_tasks_per_job         50
> halflife_decay_list               none
> policy_hierarchy                  SOF
> weight_ticket                     1.000000
> weight_waiting_time               0.278000
> weight_deadline                   3600000.000000
> weight_urgency                    0.000000
> weight_priority                   0.000000
> max_reservation                   10
> default_duration                  7200:00:00
> 
> Here's the example:
> 
> Job #34378 was submitted as:
> qsub -pe smp 16 -R y -b y "R CMD BATCH --no-save tmp.R tmp.out"
> 
> 
> Soon after submitting #34378, we see that the job #34378 is next in line:
> job-ID  prior   name       user         state submit/start at     queue       
>                    slots ja-task-ID 
> -----------------------------------------------------------------------------------------------------------------
>   33004 0.11762 tophat.sh  seqc         r     04/24/2013 07:14:20 
> [email protected]       32        
>   33718 0.12405 fooSU_long lwtai        r     05/06/2013 17:01:58 
> [email protected]       1        
>   33719 0.12405 fooSV_long lwtai        r     05/06/2013 17:01:58 
> [email protected]       1        
>   33720 0.12405 fooWV_long lwtai        r     05/06/2013 17:01:58 
> [email protected]       1        
>   33721 0.12405 fooWU_long lwtai        r     05/06/2013 17:01:58 
> [email protected]       1        
>   33745 0.06583 toy.sh     yjhuoh       r     05/07/2013 22:29:28 
> [email protected]        1        
>   33758 0.06583 toy.sh     yjhuoh       r     05/07/2013 22:30:28 
> [email protected]        1        
>   33763 0.06583 toy.sh     yjhuoh       r     05/07/2013 22:33:58 
> [email protected]        1        
>   33787 0.06583 toy.sh     yjhuoh       r     05/08/2013 00:15:58 
> [email protected]        1        
>   33794 0.06583 toy.sh     yjhuoh       r     05/08/2013 01:45:58 
> [email protected]        1        
>   34183 0.00570 SubSampleF isoform      r     05/09/2013 03:29:32 
> [email protected]        8        
>   34185 0.00570 SubSampleF isoform      r     05/09/2013 04:27:47 
> [email protected]        8        
>   34186 0.00570 SubSampleF isoform      r     05/09/2013 04:36:47 
> [email protected]        8        
>   34187 0.00570 SubSampleF isoform      r     05/09/2013 05:05:02 
> [email protected]        8        
>   34188 0.00570 SubSampleF isoform      r     05/09/2013 05:42:17 
> [email protected]        8        
>   34189 0.00570 SubSampleF isoform      r     05/09/2013 06:12:47 
> [email protected]        8        
>   34190 0.00570 SubSampleF isoform      r     05/09/2013 06:14:17 
> [email protected]        8        
>   34191 0.00570 SubSampleF isoform      r     05/09/2013 07:07:32 
> [email protected]        8        
>   34192 0.00570 SubSampleF isoform      r     05/09/2013 07:24:02 
> [email protected]        8        
>   34194 0.00570 SubSampleF isoform      r     05/09/2013 07:37:17 
> [email protected]        8        
>   34378 1.00000 R CMD BATC paciorek     qw    05/09/2013 08:14:31             
>                       16        
>   34195 0.00000 SubSampleF isoform      qw    05/08/2013 19:30:51             
>                        8        
>   34196 0.00000 SubSampleF isoform      qw    05/08/2013 19:30:51             
>                        8        
>   34197 0.00000 SubSampleF isoform      qw    05/08/2013 19:30:51             
>                        8        
>   34198 0.00000 SubSampleF isoform      qw    05/08/2013 19:30:51             
>                        8        
>   34199 0.00000 SubSampleF isoform      qw    05/08/2013 19:30:51             
>                        8        
>   34200 0.00000 SubSampleF isoform      qw    05/08/2013 19:30:51             
>                        8        
>   34201 0.00000 SubSampleF isoform      qw    05/08/2013 19:30:51             
>                        8        
>   34202 0.00000 SubSampleF isoform      qw    05/08/2013 19:30:51             
>                        8        
>   34203 0.00000 SubSampleF isoform      qw    05/08/2013 19:30:51             
>                        8        
>   34204 0.00000 SubSampleF isoform      qw    05/08/2013 19:30:51             
>                        8        
>   34205 0.00000 SubSampleF isoform      qw    05/08/2013 19:30:51             
>                        8        
>   34206 0.00000 SubSampleF isoform      qw    05/08/2013 19:30:51             
>                        8        
>   34207 0.00000 SubSampleF isoform      qw    05/08/2013 19:30:51             
>                        8        
>   34208 0.00000 SubSampleF isoform      qw    05/08/2013 19:30:51             
>                        8        
>   34209 0.00000 SubSampleF isoform      qw    05/08/2013 19:30:51             
>                        8        
>   34210 0.00000 SubSampleF isoform      qw    05/08/2013 19:30:51             
>                        8        
> 
> A little while later, we see that jobs 34195-34198 have slipped ahead of 
> 34378:
> 
> job-ID  prior   name       user         state submit/start at     queue       
>                    slots ja-task-ID 
> -----------------------------------------------------------------------------------------------------------------
>   33004 0.11790 tophat.sh  seqc         r     04/24/2013 07:14:20 
> [email protected]       32        
>   33718 0.12398 fooSU_long lwtai        r     05/06/2013 17:01:58 
> [email protected]       1        
>   33719 0.12398 fooSV_long lwtai        r     05/06/2013 17:01:58 
> [email protected]       1        
>   33720 0.12398 fooWV_long lwtai        r     05/06/2013 17:01:58 
> [email protected]       1        
>   33721 0.12398 fooWU_long lwtai        r     05/06/2013 17:01:58 
> [email protected]       1        
>   33745 0.08234 toy.sh     yjhuoh       r     05/07/2013 22:29:28 
> [email protected]        1        
>   33758 0.08234 toy.sh     yjhuoh       r     05/07/2013 22:30:28 
> [email protected]        1        
>   33763 0.08234 toy.sh     yjhuoh       r     05/07/2013 22:33:58 
> [email protected]        1        
>   33787 0.08234 toy.sh     yjhuoh       r     05/08/2013 00:15:58 
> [email protected]        1        
>   34188 0.00568 SubSampleF isoform      r     05/09/2013 05:42:17 
> [email protected]        8        
>   34189 0.00568 SubSampleF isoform      r     05/09/2013 06:12:47 
> [email protected]        8        
>   34190 0.00568 SubSampleF isoform      r     05/09/2013 06:14:17 
> [email protected]        8        
>   34191 0.00568 SubSampleF isoform      r     05/09/2013 07:07:32 
> [email protected]        8        
>   34192 0.00568 SubSampleF isoform      r     05/09/2013 07:24:02 
> [email protected]        8        
>   34194 0.00568 SubSampleF isoform      r     05/09/2013 07:37:17 
> [email protected]        8        
>   34195 0.00568 SubSampleF isoform      r     05/09/2013 08:16:47 
> [email protected]        8        
>   34196 0.00568 SubSampleF isoform      r     05/09/2013 08:47:32 
> [email protected]        8        
>   34197 0.00568 SubSampleF isoform      r     05/09/2013 09:11:02 
> [email protected]        8        
>   34198 0.00568 SubSampleF isoform      r     05/09/2013 09:16:32 
> [email protected]        8        
>   34378 1.00000 R CMD BATC paciorek     qw    05/09/2013 08:14:31             
>                       16        
>   34199 0.00000 SubSampleF isoform      qw    05/08/2013 19:30:51             
>                        8        
>   34200 0.00000 SubSampleF isoform      qw    05/08/2013 19:30:51             
>                        8        
>   34201 0.00000 SubSampleF isoform      qw    05/08/2013 19:30:51             
>                        8        
>   34202 0.00000 SubSampleF isoform      qw    05/08/2013 19:30:51             
>                        8        
>   34203 0.00000 SubSampleF isoform      qw    05/08/2013 19:30:51             
>                        8        
>   34204 0.00000 SubSampleF isoform      qw    05/08/2013 19:30:51             
>                        8        
>   34205 0.00000 SubSampleF isoform      qw    05/08/2013 19:30:51             
>                        8        
>   34206 0.00000 SubSampleF isoform      qw    05/08/2013 19:30:51             
>                        8        
>   34207 0.00000 SubSampleF isoform      qw    05/08/2013 19:30:51             
>                        8        
>   34208 0.00000 SubSampleF isoform      qw    05/08/2013 19:30:51             
>                        8        
>   34209 0.00000 SubSampleF isoform      qw    05/08/2013 19:30:51             
>                        8        
>   34210 0.00000 SubSampleF isoform      qw    05/08/2013 19:30:51             
>                        8        
>   34211 0.00000 SubSampleF isoform      qw    05/08/2013 19:30:52             
>                        8        
>   34212 0.00000 SubSampleF isoform      qw    05/08/2013 19:30:52             
>                        8        
>   34213 0.00000 SubSampleF isoform      qw    05/08/2013 19:30:52             
>                        8        
>   34214 0.00000 SubSampleF isoform      qw    05/08/2013 19:30:52             
>                        8        
>   34215 0.00000 SubSampleF isoform      qw    05/08/2013 19:30:52             
>                        8        
>   34216 0.00000 SubSampleF isoform      qw    05/08/2013 19:30:52             
>                        8        
>   34217 0.00000 SubSampleF isoform      qw    05/08/2013 19:30:52             
>                        8        
>   34218 0.00000 SubSampleF isoform      qw    05/08/2013 19:30:52             
>                        8        
> 
> The schedule file shows that there are RESERVING statements for #34378:
> 34378:1:RESERVING:1369228520:25920060:P:smp:slots:16.000000
> 34378:1:RESERVING:1369228520:25920060:Q:[email protected]:slots:16.000000
> 
> Perhaps the issue is that the reservation seems specific to the cluster node 
> "scf-sm02.Berkeley.EDU", and that specific node is occupied by a long-running 
> job (#33004). If so, is there any way to have the reservation not tied to a 
> node?
> 
> -Chris
> 
> ----------------------------------------------------------------------------------------------
> Chris Paciorek 
> 
> Statistical Computing Consultant, Associate Research Statistician, Lecturer
> 
> Office: 495 Evans Hall                      Email: [email protected]
> Mailing Address:                            Voice: 510-842-6670 
> Department of Statistics                    Fax:   510-642-7892
> 367 Evans Hall                              Skype: cjpaciorek
> University of California, Berkeley          WWW:   
> www.stat.berkeley.edu/~paciorek
> Berkeley, CA 94720 USA                      Permanent forward: 
> [email protected]
> 
> 
> _______________________________________________
> users mailing list
> [email protected]
> https://gridengine.org/mailman/listinfo/users


_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Reply via email to