Re: [gridengine users] Resource Reservation logging

Reuti Mon, 07 Oct 2013 07:15:21 -0700

Am 07.10.2013 um 16:09 schrieb Txema Heredia:

> El 07/10/13 16:00, Reuti escribió:
>> Am 07.10.2013 um 15:59 schrieb Txema Heredia:
>> 
>>> El 07/10/13 14:58, Reuti escribió:
>>>> Hi,
>>>> 
>>>> Am 07.10.2013 um 13:15 schrieb Txema Heredia:
>>>> 
>>>>> The problem is that, right now, the mandatory usage of h_rt is not an 
>>>>> option. So we need to work considering that all jobs will last to 
>>>>> infinity and beyond.
>>>>> 
>>>>> Right now, the scheduler configuration is:
>>>>> max_reservation 50
>>>>> default_duration 24:00:00
>>>>> 
>>>>> During the weekend, most of the parallel ( and -R y) jobs started 
>>>>> running, but now there is something fishy in my queues:
>>>>> 
>>>>> The first 3 jobs in my waiting queue belong to user1. All 3 jobs request 
>>>>> -pe mpich_round 12, -R y and -l h_vmem=4G (h_vmem is set to consumable = 
>>>>> YES, not JOB).
>>>> Which amount of memory did you specify in the exechost definition, i.e. 
>>>> what's in the machine physically?
>>>> 
>>>> -- Reuti
>>> 26 nodes have 96GB of ram. One node has 48GB.
>> And you defined it on an exechost level under "complex_values"? - Reuti
> 
> Yes, on all nodes.
> # qconf -se c0-0 | grep h_vmem
> complex_values        local_disk=400G,slots=12,h_vmem=96G


Good, what is the defintion of the requested PE - any special "allocation_rule"?


> PS: I've been told that there are some problems with local_disk, but 
> currently no job is making use of it

It may be a custom load sensor, it's nothing SGE provides by default.


>>> Currently nodes range from 4 to 10 free slots and from 26 to 82.1 free GB
>>> 
>>> The first jobs in my waiting queue (after the 3 reserving ones) require 
>>> measly 0.9G, 3G and 12G, all with slots=1 and -R n. None of them is 
>>> scheduled. But if I manually increase their priority so they are put BEFORE 
>>> the 3 -R y jobs, they are immediately scheduled.
>>> 
>>>> 
>>>>> This user has already one job like these running. User1 has a RQS that 
>>>>> limits him to use only 12 slots in the whole cluster. Thus the 3 waiting 
>>>>> jobs will not be able to run until the first one finishes.
>>>>> 
>>>>> This is the current schedule log:
>>>>> 
>>>>> # grep "::::\|RESERVING" schedule | tail -200 | grep "::::\|Q:all" | tail 
>>>>> -37 | sort
>>>>> ::::::::
>>>>> 2734185:1:RESERVING:1381142325:86460:Q:[email protected]:slots:1.000000
>>>>> 2734185:1:RESERVING:1381142325:86460:Q:[email protected]:slots:1.000000
>>>>> 2734185:1:RESERVING:1381142325:86460:Q:[email protected]:slots:1.000000
>>>>> 2734185:1:RESERVING:1381142325:86460:Q:[email protected]:slots:1.000000
>>>>> 2734185:1:RESERVING:1381142325:86460:Q:[email protected]:slots:1.000000
>>>>> 2734185:1:RESERVING:1381142325:86460:Q:[email protected]:slots:1.000000
>>>>> 2734185:1:RESERVING:1381142325:86460:Q:[email protected]:slots:1.000000
>>>>> 2734185:1:RESERVING:1381142325:86460:Q:[email protected]:slots:1.000000
>>>>> 2734185:1:RESERVING:1381142325:86460:Q:[email protected]:slots:1.000000
>>>>> 2734185:1:RESERVING:1381142325:86460:Q:[email protected]:slots:1.000000
>>>>> 2734185:1:RESERVING:1381142325:86460:Q:[email protected]:slots:1.000000
>>>>> 2734185:1:RESERVING:1381142325:86460:Q:[email protected]:slots:1.000000
>>>>> 2734186:1:RESERVING:1381228785:86460:Q:[email protected]:slots:1.000000
>>>>> 2734186:1:RESERVING:1381228785:86460:Q:[email protected]:slots:1.000000
>>>>> 2734186:1:RESERVING:1381228785:86460:Q:[email protected]:slots:1.000000
>>>>> 2734186:1:RESERVING:1381228785:86460:Q:[email protected]:slots:1.000000
>>>>> 2734186:1:RESERVING:1381228785:86460:Q:[email protected]:slots:1.000000
>>>>> 2734186:1:RESERVING:1381228785:86460:Q:[email protected]:slots:1.000000
>>>>> 2734186:1:RESERVING:1381228785:86460:Q:[email protected]:slots:1.000000
>>>>> 2734186:1:RESERVING:1381228785:86460:Q:[email protected]:slots:1.000000
>>>>> 2734186:1:RESERVING:1381228785:86460:Q:[email protected]:slots:1.000000
>>>>> 2734186:1:RESERVING:1381228785:86460:Q:[email protected]:slots:1.000000
>>>>> 2734186:1:RESERVING:1381228785:86460:Q:[email protected]:slots:1.000000
>>>>> 2734186:1:RESERVING:1381228785:86460:Q:[email protected]:slots:1.000000
>>>>> 2734187:1:RESERVING:1381315245:86460:Q:[email protected]:slots:1.000000
>>>>> 2734187:1:RESERVING:1381315245:86460:Q:[email protected]:slots:1.000000
>>>>> 2734187:1:RESERVING:1381315245:86460:Q:[email protected]:slots:1.000000
>>>>> 2734187:1:RESERVING:1381315245:86460:Q:[email protected]:slots:1.000000
>>>>> 2734187:1:RESERVING:1381315245:86460:Q:[email protected]:slots:1.000000
>>>>> 2734187:1:RESERVING:1381315245:86460:Q:[email protected]:slots:1.000000
>>>>> 2734187:1:RESERVING:1381315245:86460:Q:[email protected]:slots:1.000000
>>>>> 2734187:1:RESERVING:1381315245:86460:Q:[email protected]:slots:1.000000
>>>>> 2734187:1:RESERVING:1381315245:86460:Q:[email protected]:slots:1.000000
>>>>> 2734187:1:RESERVING:1381315245:86460:Q:[email protected]:slots:1.000000
>>>>> 2734187:1:RESERVING:1381315245:86460:Q:[email protected]:slots:1.000000
>>>>> 2734187:1:RESERVING:1381315245:86460:Q:[email protected]:slots:1.000000
>>>>> 
>>>>> 
>>>>> Right now, the cluster is using 190 slots of 320 total. The schedule log 
>>>>> says that the 3 waiting jobs form user1 are the only jobs making any kind 
>>>>> of reservation. These jobs are reserving a total of 36 cores. These 3 
>>>>> jobs are effectively blocking 36 already-free slots because the RQS 
>>>>> doesn't allow user1 to make usage of more than 12 slots at once. This is 
>>>>> not "nice" but I understand that the scheduler has its limitations and 
>>>>> cannot predict the future.
>>>>> 
>>>>> Taking into account the jobs running + the slots & memory locked by the 
>>>>> reserving jobs, there is a grand total of 226 slots locked. Thus leaving 
>>>>> 94 free slots.
>>>>> 
>>>>> Here comes the problem: Even though there are 94 free slots and lots of 
>>>>> spare memory, NONE of the 4300 waiting jobs is running. There are nodes 
>>>>> with 6 free slots and 59 GB of free RAM but none of the waiting jobs is 
>>>>> scheduled. New jobs only star running when one of the 190 slots occupied 
>>>>> by running jobs is freed. None of these other waiting jobs is requesting 
>>>>> -R y, -pe nor h_rt.
>>>>> 
>>>>> 
>>>>> Additionaly, this is creating some odd behaviour. It seems that, on each 
>>>>> scheduler run, it is trying to start jobs in those "blocked slots", but 
>>>>> it fails with no apparent reason. Some of the jobs are even trying to 
>>>>> start twice, but almost none (generally none at all) gets to run:
>>>>> 
>>>>> # tail -2000 schedule | grep -A 1000 "::::::" | grep "Q:all" | grep 
>>>>> STARTING | sort
>>>>> 2734121:1:STARTING:1381144160:86460:Q:[email protected]:slots:1.000000
>>>>> 2734122:1:STARTING:1381144160:86460:Q:[email protected]:slots:1.000000
>>>>> 2734123:1:STARTING:1381144160:86460:Q:[email protected]:slots:1.000000
>>>>> 2734124:1:STARTING:1381144160:86460:Q:[email protected]:slots:1.000000
>>>>> 2734125:1:STARTING:1381144160:86460:Q:[email protected]:slots:1.000000
>>>>> 2734126:1:STARTING:1381144160:86460:Q:[email protected]:slots:1.000000
>>>>> 2734127:1:STARTING:1381144160:86460:Q:[email protected]:slots:1.000000
>>>>> 2734128:1:STARTING:1381144160:86460:Q:[email protected]:slots:1.000000
>>>>> 2734129:1:STARTING:1381144160:86460:Q:[email protected]:slots:1.000000
>>>>> 2734130:1:STARTING:1381144160:86460:Q:[email protected]:slots:1.000000
>>>>> 2734131:1:STARTING:1381144160:86460:Q:[email protected]:slots:1.000000
>>>>> 2734132:1:STARTING:1381144160:86460:Q:[email protected]:slots:1.000000
>>>>> 2734133:1:STARTING:1381144160:86460:Q:[email protected]:slots:1.000000
>>>>> 2734134:1:STARTING:1381144160:86460:Q:[email protected]:slots:1.000000
>>>>> 2734135:1:STARTING:1381144160:86460:Q:[email protected]:slots:1.000000
>>>>> 2734136:1:STARTING:1381144160:86460:Q:[email protected]:slots:1.000000
>>>>> 2734137:1:STARTING:1381144160:86460:Q:[email protected]:slots:1.000000
>>>>> 2734138:1:STARTING:1381144160:86460:Q:[email protected]:slots:1.000000
>>>>> 2734139:1:STARTING:1381144160:86460:Q:[email protected]:slots:1.000000
>>>>> 2734140:1:STARTING:1381144160:86460:Q:[email protected]:slots:1.000000
>>>>> 2734141:1:STARTING:1381144160:86460:Q:[email protected]:slots:1.000000
>>>>> 2734142:1:STARTING:1381144160:86460:Q:[email protected]:slots:1.000000
>>>>> 2734143:1:STARTING:1381144160:86460:Q:[email protected]:slots:1.000000
>>>>> 2734144:1:STARTING:1381144160:86460:Q:[email protected]:slots:1.000000
>>>>> 2734145:1:STARTING:1381144160:86460:Q:[email protected]:slots:1.000000
>>>>> 2734146:1:STARTING:1381144160:86460:Q:[email protected]:slots:1.000000
>>>>> 2734147:1:STARTING:1381144160:86460:Q:[email protected]:slots:1.000000
>>>>> 2734148:1:STARTING:1381144160:86460:Q:[email protected]:slots:1.000000
>>>>> 2734149:1:STARTING:1381144160:86460:Q:[email protected]:slots:1.000000
>>>>> 2734150:1:STARTING:1381144160:86460:Q:[email protected]:slots:1.000000
>>>>> 2734151:1:STARTING:1381144160:86460:Q:[email protected]:slots:1.000000
>>>>> 2734152:1:STARTING:1381144160:86460:Q:[email protected]:slots:1.000000
>>>>> 2734153:1:STARTING:1381144160:86460:Q:[email protected]:slots:1.000000
>>>>> 2734154:1:STARTING:1381144160:86460:Q:[email protected]:slots:1.000000
>>>>> 2734155:1:STARTING:1381144160:86460:Q:[email protected]:slots:1.000000
>>>>> 2734156:1:STARTING:1381144160:86460:Q:[email protected]:slots:1.000000
>>>>> 2734157:1:STARTING:1381144160:86460:Q:[email protected]:slots:1.000000
>>>>> 2734158:1:STARTING:1381144160:86460:Q:[email protected]:slots:1.000000
>>>>> 2734159:1:STARTING:1381144160:86460:Q:[email protected]:slots:1.000000
>>>>> 2734160:1:STARTING:1381144160:86460:Q:[email protected]:slots:1.000000
>>>>> 2734161:1:STARTING:1381144160:86460:Q:[email protected]:slots:1.000000
>>>>> 2735158:1:STARTING:1381144160:86460:Q:[email protected]:slots:1.000000
>>>>> 2735159:1:STARTING:1381144160:86460:Q:[email protected]:slots:1.000000
>>>>> 2735160:1:STARTING:1381144160:86460:Q:[email protected]:slots:1.000000
>>>>> 2735161:1:STARTING:1381144160:86460:Q:[email protected]:slots:1.000000
>>>>> 2735162:1:STARTING:1381144160:86460:Q:[email protected]:slots:1.000000
>>>>> 2735163:1:STARTING:1381144160:86460:Q:[email protected]:slots:1.000000
>>>>> 2735164:1:STARTING:1381144160:86460:Q:[email protected]:slots:1.000000
>>>>> 2735165:1:STARTING:1381144160:86460:Q:[email protected]:slots:1.000000
>>>>> 2735166:1:STARTING:1381144160:86460:Q:[email protected]:slots:1.000000
>>>>> 2735167:1:STARTING:1381144160:86460:Q:[email protected]:slots:1.000000
>>>>> 2735168:1:STARTING:1381144160:86460:Q:[email protected]:slots:1.000000
>>>>> 2735169:1:STARTING:1381144160:86460:Q:[email protected]:slots:1.000000
>>>>> 2735170:1:STARTING:1381144160:86460:Q:[email protected]:slots:1.000000
>>>>> 2735171:1:STARTING:1381144160:86460:Q:[email protected]:slots:1.000000
>>>>> 2735172:1:STARTING:1381144160:86460:Q:[email protected]:slots:1.000000
>>>>> 2735173:1:STARTING:1381144160:86460:Q:[email protected]:slots:1.000000
>>>>> 2735174:1:STARTING:1381144160:86460:Q:[email protected]:slots:1.000000
>>>>> 2735175:1:STARTING:1381144160:86460:Q:[email protected]:slots:1.000000
>>>>> 2735176:1:STARTING:1381144160:86460:Q:[email protected]:slots:1.000000
>>>>> 2735177:1:STARTING:1381144160:86460:Q:[email protected]:slots:1.000000
>>>>> 2735178:1:STARTING:1381144160:86460:Q:[email protected]:slots:1.000000
>>>>> 2735179:1:STARTING:1381144160:86460:Q:[email protected]:slots:1.000000
>>>>> 2735180:1:STARTING:1381144160:86460:Q:[email protected]:slots:1.000000
>>>>> 2735181:1:STARTING:1381144160:86460:Q:[email protected]:slots:1.000000
>>>>> 2735182:1:STARTING:1381144160:86460:Q:[email protected]:slots:1.000000
>>>>> 2735183:1:STARTING:1381144160:86460:Q:[email protected]:slots:1.000000
>>>>> 2735184:1:STARTING:1381144160:86460:Q:[email protected]:slots:1.000000
>>>>> 2735185:1:STARTING:1381144160:86460:Q:[email protected]:slots:1.000000
>>>>> 2735186:1:STARTING:1381144160:86460:Q:[email protected]:slots:1.000000
>>>>> 2735187:1:STARTING:1381144160:86460:Q:[email protected]:slots:1.000000
>>>>> 2735188:1:STARTING:1381144160:86460:Q:[email protected]:slots:1.000000
>>>>> 2735189:1:STARTING:1381144160:86460:Q:[email protected]:slots:1.000000
>>>>> 2735190:1:STARTING:1381144160:86460:Q:[email protected]:slots:1.000000
>>>>> 2735191:1:STARTING:1381144160:86460:Q:[email protected]:slots:1.000000
>>>>> 2735192:1:STARTING:1381144160:86460:Q:[email protected]:slots:1.000000
>>>>> 2735193:1:STARTING:1381144160:86460:Q:[email protected]:slots:1.000000
>>>>> 2743479:1:STARTING:1381144160:86460:Q:[email protected]:slots:1.000000
>>>>> 2743480:1:STARTING:1381144160:86460:Q:[email protected]:slots:1.000000
>>>>> 2743481:1:STARTING:1381144160:86460:Q:[email protected]:slots:1.000000
>>>>> 2743482:1:STARTING:1381144160:86460:Q:[email protected]:slots:1.000000
>>>>> 2743483:1:STARTING:1381144160:86460:Q:[email protected]:slots:1.000000
>>>>> 2743484:1:STARTING:1381144160:86460:Q:[email protected]:slots:1.000000
>>>>> 2743485:1:STARTING:1381144160:86460:Q:[email protected]:slots:1.000000
>>>>> 2743486:1:STARTING:1381144160:86460:Q:[email protected]:slots:1.000000
>>>>> 2743487:1:STARTING:1381144160:86460:Q:[email protected]:slots:1.000000
>>>>> 2743488:1:STARTING:1381144160:86460:Q:[email protected]:slots:1.000000
>>>>> 2743489:1:STARTING:1381144160:86460:Q:[email protected]:slots:1.000000
>>>>> 2743490:1:STARTING:1381144160:86460:Q:[email protected]:slots:1.000000
>>>>> 2743491:1:STARTING:1381144160:86460:Q:[email protected]:slots:1.000000
>>>>> 2743492:1:STARTING:1381144160:86460:Q:[email protected]:slots:1.000000
>>>>> 2743493:1:STARTING:1381144160:86460:Q:[email protected]:slots:1.000000
>>>>> 2743494:1:STARTING:1381144160:86460:Q:[email protected]:slots:1.000000
>>>>> 2743495:1:STARTING:1381144160:86460:Q:[email protected]:slots:1.000000
>>>>> 2743496:1:STARTING:1381144160:86460:Q:[email protected]:slots:1.000000
>>>>> 2743497:1:STARTING:1381144160:86460:Q:[email protected]:slots:1.000000
>>>>> 2743498:1:STARTING:1381144160:86460:Q:[email protected]:slots:1.000000
>>>>> 2743499:1:STARTING:1381144160:86460:Q:[email protected]:slots:1.000000
>>>>> 2743500:1:STARTING:1381144160:86460:Q:[email protected]:slots:1.000000
>>>>> 2743501:1:STARTING:1381144160:86460:Q:[email protected]:slots:1.000000
>>>>> 2743502:1:STARTING:1381144160:86460:Q:[email protected]:slots:1.000000
>>>>> 2743503:1:STARTING:1381144160:86460:Q:[email protected]:slots:1.000000
>>>>> 2743504:1:STARTING:1381144160:86460:Q:[email protected]:slots:1.000000
>>>>> 2743505:1:STARTING:1381144160:86460:Q:[email protected]:slots:1.000000
>>>>> 2743506:1:STARTING:1381144160:86460:Q:[email protected]:slots:1.000000
>>>>> 2743507:1:STARTING:1381144160:86460:Q:[email protected]:slots:1.000000
>>>>> 2743508:1:STARTING:1381144160:86460:Q:[email protected]:slots:1.000000
>>>>> 2743509:1:STARTING:1381144160:86460:Q:[email protected]:slots:1.000000
>>>>> 2743510:1:STARTING:1381144160:86460:Q:[email protected]:slots:1.000000
>>>>> 2743511:1:STARTING:1381144160:86460:Q:[email protected]:slots:1.000000
>>>>> 2743512:1:STARTING:1381144160:86460:Q:[email protected]:slots:1.000000
>>>>> 2743513:1:STARTING:1381144160:86460:Q:[email protected]:slots:1.000000
>>>>> 2743514:1:STARTING:1381144160:86460:Q:[email protected]:slots:1.000000
>>>>> 2743515:1:STARTING:1381144160:86460:Q:[email protected]:slots:1.000000
>>>>> 2743516:1:STARTING:1381144160:86460:Q:[email protected]:slots:1.000000
>>>>> 2743517:1:STARTING:1381144160:86460:Q:[email protected]:slots:1.000000
>>>>> 2743518:1:STARTING:1381144160:86460:Q:[email protected]:slots:1.000000
>>>>> 
>>>>> 
>>>>> Even though jobs appear here listed as "starting" they are not running at 
>>>>> all. But they are issuing a "starting" message on each scheduling 
>>>>> interval.
>>>>> 
>>>>> Why are the reservations blocking a third of the cluster??? It shouldn't 
>>>>> be a backfilling issue, they are blocking the usage of 3 times the slots 
>>>>> reserved. Why the "starting" jobs cannot run?
>>>>> 
>>>>> Txema
>>>>> 
>>>>> 
>>>>> 
>>>>> El 07/10/13 09:28, Christian Krause escribió:
>>>>>> Hello,
>>>>>> 
>>>>>> We solved it the way that `h_rt` is set to FORCED in the complex list:
>>>>>> 
>>>>>>     #name                    shortcut      type        relop requestable 
>>>>>> consumable default  urgency
>>>>>>     
>>>>>> #------------------------------------------------------------------------------------------------
>>>>>>     h_rt                     h_rt          TIME        <=    FORCED      
>>>>>> YES        0:0:0    0
>>>>>> 
>>>>>> And have a JSV rejecting jobs that don't request it (because they would 
>>>>>> be pending indefinetely
>>>>>> unless you have a default duration or use qalter).
>>>>>> 
>>>>>> You could also use a JSV to enforce that only jobs with large resources 
>>>>>> (in your case more than some
>>>>>> amount of slots) are able to request reservation, i.e.:
>>>>>> 
>>>>>>     # pseudo JSV code
>>>>>>          SLOT_RESERVATION_THRESHOLD=...
>>>>>>          if slots < SLOT_RESERVATION_THRESHOLD then
>>>>>>         "disable reservation / reject"
>>>>>>     else
>>>>>>         "enable reservation"
>>>>>>     fi
>>>>>> 
>>>>>> 
>>>>>> On Fri, Oct 04, 2013 at 04:25:29PM +0200, Txema Heredia wrote:
>>>>>>> Hi all,
>>>>>>> 
>>>>>>> I have a 27-node cluster. Currently there are 320 out of 320 slots
>>>>>>> filled up. All by jobs requesting 1-slot.
>>>>>>> 
>>>>>>> At the top of my waiting queue there are 28 different jobs
>>>>>>> requesting 3 to 12 cores using two different parallel environments.
>>>>>>> All these jobs are requesting -R y. They are being ignored and
>>>>>>> overrun by the myriad of 1-slot requesting  jobs behind them in the
>>>>>>> waiting queue.
>>>>>>> 
>>>>>>> I have enabled the scheduler logging. During the last 4 hours, it
>>>>>>> has logged 724 new jobs starting, in all the 27 nodes. Not a single
>>>>>>> job on the system is requesting -l h_rt, but single-core jobs keep
>>>>>>> being scheduled  and all the parallel jobs are starving.
>>>>>>> 
>>>>>>> As far as I understand, the backfilling is killing my reservations,
>>>>>>> even if no one is requesting any kind of time, but if I set the
>>>>>>> "default_duration" to INFINITY, all the RESERVING log messages
>>>>>>> disappear.
>>>>>>> 
>>>>>>> Additionaly, for some odd reason, I only receive RESERVING messages
>>>>>>> from the jobs requesting a given number of slots (-pe whatever N).
>>>>>>> The jobs requesting a slot-range (-pe threaded 4-10) seem to reserve
>>>>>>> nothing.
>>>>>>> 
>>>>>>> My scheduler configuration is as follows:
>>>>>>> 
>>>>>>> # qconf -ssconf
>>>>>>> algorithm                         default
>>>>>>> schedule_interval                 0:0:5
>>>>>>> maxujobs                          0
>>>>>>> queue_sort_method                 load
>>>>>>> job_load_adjustments              np_load_avg=0.50
>>>>>>> load_adjustment_decay_time        0:7:30
>>>>>>> load_formula                      np_load_avg
>>>>>>> schedd_job_info                   true
>>>>>>> flush_submit_sec                  0
>>>>>>> flush_finish_sec                  0
>>>>>>> params                            MONITOR=1
>>>>>>> reprioritize_interval             0:0:0
>>>>>>> halftime                          168
>>>>>>> usage_weight_list cpu=0.187000,mem=0.116000,io=0.697000
>>>>>>> compensation_factor               5.000000
>>>>>>> weight_user                       0.250000
>>>>>>> weight_project                    0.250000
>>>>>>> weight_department                 0.250000
>>>>>>> weight_job                        0.250000
>>>>>>> weight_tickets_functional         1000000000
>>>>>>> weight_tickets_share              1000000000
>>>>>>> share_override_tickets            TRUE
>>>>>>> share_functional_shares           TRUE
>>>>>>> max_functional_jobs_to_schedule   200
>>>>>>> report_pjob_tickets               TRUE
>>>>>>> max_pending_tasks_per_job         50
>>>>>>> halflife_decay_list               none
>>>>>>> policy_hierarchy                  OSF
>>>>>>> weight_ticket                     0.010000
>>>>>>> weight_waiting_time               0.000000
>>>>>>> weight_deadline                   3600000.000000
>>>>>>> weight_urgency                    0.100000
>>>>>>> weight_priority                   1.000000
>>>>>>> max_reservation                   50
>>>>>>> default_duration                  24:00:00
>>>>>>> 
>>>>>>> 
>>>>>>> I have also tested it with params PROFILE=1 and default_duration
>>>>>>> INFINITY. But, when I set it, not a single reservation is logged in
>>>>>>> /opt/gridengine/default/common/schedule and new jobs keep starting.
>>>>>>> 
>>>>>>> 
>>>>>>> What am I missing? Is it possible to kill the backfilling? Are my
>>>>>>> reservations really working?
>>>>>>> 
>>>>>>> Thanks in advance,
>>>>>>> 
>>>>>>> Txema
>>>>>>> _______________________________________________
>>>>>>> users mailing list
>>>>>>> [email protected]
>>>>>>> https://gridengine.org/mailman/listinfo/users
>>>>> _______________________________________________
>>>>> users mailing list
>>>>> [email protected]
>>>>> https://gridengine.org/mailman/listinfo/users
>>>>> 
> 


_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Re: [gridengine users] Resource Reservation logging

Reply via email to