I needed to increase the priority of the jobs of one user and I wasn't able to do so. No matter how many times I issued qalter -p 1024 -u user, the waiting queue remained the same. I have just rebooted the sge_qmaster daemon, et voilà, the jobs had its proper priority and any job that was able to run was scheduled. After simply rebooting it, now my cluster is using 292 (+36 reserved) slots out of 320 total.

So it seems all this is a matter of qmaster degradation. This rises further questions, like how is it possible that the qmaster degraded this far only after 4 days of turning on the reservations...

Thanks for all,

Txema


El 07/10/13 16:18, Txema Heredia escribió:
El 07/10/13 16:12, Reuti escribió:
Am 07.10.2013 um 16:09 schrieb Txema Heredia:

El 07/10/13 16:00, Reuti escribió:
Am 07.10.2013 um 15:59 schrieb Txema Heredia:

El 07/10/13 14:58, Reuti escribió:
Hi,

Am 07.10.2013 um 13:15 schrieb Txema Heredia:

The problem is that, right now, the mandatory usage of h_rt is not an option. So we need to work considering that all jobs will last to infinity and beyond.

Right now, the scheduler configuration is:
max_reservation 50
default_duration 24:00:00

During the weekend, most of the parallel ( and -R y) jobs started running, but now there is something fishy in my queues:

The first 3 jobs in my waiting queue belong to user1. All 3 jobs request -pe mpich_round 12, -R y and -l h_vmem=4G (h_vmem is set to consumable = YES, not JOB).
Which amount of memory did you specify in the exechost definition, i.e. what's in the machine physically?

-- Reuti
26 nodes have 96GB of ram. One node has 48GB.
And you defined it on an exechost level under "complex_values"? - Reuti
Yes, on all nodes.
# qconf -se c0-0 | grep h_vmem
complex_values        local_disk=400G,slots=12,h_vmem=96G
Good, what is the defintion of the requested PE - any special "allocation_rule"?

Round robin

# qconf -sp mpich_round
pe_name            mpich_round
slots              9999
user_lists         NONE
xuser_lists        NONE
start_proc_args /opt/gridengine/mpi/startmpi.sh -catch_rsh $pe_hostfile
stop_proc_args     /opt/gridengine/mpi/stopmpi.sh
allocation_rule    $round_robin
control_slaves     TRUE
job_is_first_task  FALSE
urgency_slots      min
accounting_summary FALSE


PS: I've been told that there are some problems with local_disk, but currently no job is making use of it
It may be a custom load sensor, it's nothing SGE provides by default.

Yes, it's simply a consumable attribute that does nothing. I have just been told that sometimes host-defined consumable attributes + parallel environments don't behave properly (over-requesting and the such), but here shouldn't apply because none of the jobs is using it. We can ignore it.

Currently nodes range from 4 to 10 free slots and from 26 to 82.1 free GB

The first jobs in my waiting queue (after the 3 reserving ones) require measly 0.9G, 3G and 12G, all with slots=1 and -R n. None of them is scheduled. But if I manually increase their priority so they are put BEFORE the 3 -R y jobs, they are immediately scheduled.

This user has already one job like these running. User1 has a RQS that limits him to use only 12 slots in the whole cluster. Thus the 3 waiting jobs will not be able to run until the first one finishes.

This is the current schedule log:

# grep "::::\|RESERVING" schedule | tail -200 | grep "::::\|Q:all" | tail -37 | sort
::::::::
2734185:1:RESERVING:1381142325:86460:Q:[email protected]:slots:1.000000 2734185:1:RESERVING:1381142325:86460:Q:[email protected]:slots:1.000000 2734185:1:RESERVING:1381142325:86460:Q:[email protected]:slots:1.000000 2734185:1:RESERVING:1381142325:86460:Q:[email protected]:slots:1.000000 2734185:1:RESERVING:1381142325:86460:Q:[email protected]:slots:1.000000 2734185:1:RESERVING:1381142325:86460:Q:[email protected]:slots:1.000000 2734185:1:RESERVING:1381142325:86460:Q:[email protected]:slots:1.000000 2734185:1:RESERVING:1381142325:86460:Q:[email protected]:slots:1.000000 2734185:1:RESERVING:1381142325:86460:Q:[email protected]:slots:1.000000 2734185:1:RESERVING:1381142325:86460:Q:[email protected]:slots:1.000000 2734185:1:RESERVING:1381142325:86460:Q:[email protected]:slots:1.000000 2734185:1:RESERVING:1381142325:86460:Q:[email protected]:slots:1.000000 2734186:1:RESERVING:1381228785:86460:Q:[email protected]:slots:1.000000 2734186:1:RESERVING:1381228785:86460:Q:[email protected]:slots:1.000000 2734186:1:RESERVING:1381228785:86460:Q:[email protected]:slots:1.000000 2734186:1:RESERVING:1381228785:86460:Q:[email protected]:slots:1.000000 2734186:1:RESERVING:1381228785:86460:Q:[email protected]:slots:1.000000 2734186:1:RESERVING:1381228785:86460:Q:[email protected]:slots:1.000000 2734186:1:RESERVING:1381228785:86460:Q:[email protected]:slots:1.000000 2734186:1:RESERVING:1381228785:86460:Q:[email protected]:slots:1.000000 2734186:1:RESERVING:1381228785:86460:Q:[email protected]:slots:1.000000 2734186:1:RESERVING:1381228785:86460:Q:[email protected]:slots:1.000000 2734186:1:RESERVING:1381228785:86460:Q:[email protected]:slots:1.000000 2734186:1:RESERVING:1381228785:86460:Q:[email protected]:slots:1.000000 2734187:1:RESERVING:1381315245:86460:Q:[email protected]:slots:1.000000 2734187:1:RESERVING:1381315245:86460:Q:[email protected]:slots:1.000000 2734187:1:RESERVING:1381315245:86460:Q:[email protected]:slots:1.000000 2734187:1:RESERVING:1381315245:86460:Q:[email protected]:slots:1.000000 2734187:1:RESERVING:1381315245:86460:Q:[email protected]:slots:1.000000 2734187:1:RESERVING:1381315245:86460:Q:[email protected]:slots:1.000000 2734187:1:RESERVING:1381315245:86460:Q:[email protected]:slots:1.000000 2734187:1:RESERVING:1381315245:86460:Q:[email protected]:slots:1.000000 2734187:1:RESERVING:1381315245:86460:Q:[email protected]:slots:1.000000 2734187:1:RESERVING:1381315245:86460:Q:[email protected]:slots:1.000000 2734187:1:RESERVING:1381315245:86460:Q:[email protected]:slots:1.000000 2734187:1:RESERVING:1381315245:86460:Q:[email protected]:slots:1.000000


Right now, the cluster is using 190 slots of 320 total. The schedule log says that the 3 waiting jobs form user1 are the only jobs making any kind of reservation. These jobs are reserving a total of 36 cores. These 3 jobs are effectively blocking 36 already-free slots because the RQS doesn't allow user1 to make usage of more than 12 slots at once. This is not "nice" but I understand that the scheduler has its limitations and cannot predict the future.

Taking into account the jobs running + the slots & memory locked by the reserving jobs, there is a grand total of 226 slots locked. Thus leaving 94 free slots.

Here comes the problem: Even though there are 94 free slots and lots of spare memory, NONE of the 4300 waiting jobs is running. There are nodes with 6 free slots and 59 GB of free RAM but none of the waiting jobs is scheduled. New jobs only star running when one of the 190 slots occupied by running jobs is freed. None of these other waiting jobs is requesting -R y, -pe nor h_rt.


Additionaly, this is creating some odd behaviour. It seems that, on each scheduler run, it is trying to start jobs in those "blocked slots", but it fails with no apparent reason. Some of the jobs are even trying to start twice, but almost none (generally none at all) gets to run:

# tail -2000 schedule | grep -A 1000 "::::::" | grep "Q:all" | grep STARTING | sort 2734121:1:STARTING:1381144160:86460:Q:[email protected]:slots:1.000000 2734122:1:STARTING:1381144160:86460:Q:[email protected]:slots:1.000000 2734123:1:STARTING:1381144160:86460:Q:[email protected]:slots:1.000000 2734124:1:STARTING:1381144160:86460:Q:[email protected]:slots:1.000000 2734125:1:STARTING:1381144160:86460:Q:[email protected]:slots:1.000000 2734126:1:STARTING:1381144160:86460:Q:[email protected]:slots:1.000000 2734127:1:STARTING:1381144160:86460:Q:[email protected]:slots:1.000000 2734128:1:STARTING:1381144160:86460:Q:[email protected]:slots:1.000000 2734129:1:STARTING:1381144160:86460:Q:[email protected]:slots:1.000000 2734130:1:STARTING:1381144160:86460:Q:[email protected]:slots:1.000000 2734131:1:STARTING:1381144160:86460:Q:[email protected]:slots:1.000000 2734132:1:STARTING:1381144160:86460:Q:[email protected]:slots:1.000000 2734133:1:STARTING:1381144160:86460:Q:[email protected]:slots:1.000000 2734134:1:STARTING:1381144160:86460:Q:[email protected]:slots:1.000000 2734135:1:STARTING:1381144160:86460:Q:[email protected]:slots:1.000000 2734136:1:STARTING:1381144160:86460:Q:[email protected]:slots:1.000000 2734137:1:STARTING:1381144160:86460:Q:[email protected]:slots:1.000000 2734138:1:STARTING:1381144160:86460:Q:[email protected]:slots:1.000000 2734139:1:STARTING:1381144160:86460:Q:[email protected]:slots:1.000000 2734140:1:STARTING:1381144160:86460:Q:[email protected]:slots:1.000000 2734141:1:STARTING:1381144160:86460:Q:[email protected]:slots:1.000000 2734142:1:STARTING:1381144160:86460:Q:[email protected]:slots:1.000000 2734143:1:STARTING:1381144160:86460:Q:[email protected]:slots:1.000000 2734144:1:STARTING:1381144160:86460:Q:[email protected]:slots:1.000000 2734145:1:STARTING:1381144160:86460:Q:[email protected]:slots:1.000000 2734146:1:STARTING:1381144160:86460:Q:[email protected]:slots:1.000000 2734147:1:STARTING:1381144160:86460:Q:[email protected]:slots:1.000000 2734148:1:STARTING:1381144160:86460:Q:[email protected]:slots:1.000000 2734149:1:STARTING:1381144160:86460:Q:[email protected]:slots:1.000000 2734150:1:STARTING:1381144160:86460:Q:[email protected]:slots:1.000000 2734151:1:STARTING:1381144160:86460:Q:[email protected]:slots:1.000000 2734152:1:STARTING:1381144160:86460:Q:[email protected]:slots:1.000000 2734153:1:STARTING:1381144160:86460:Q:[email protected]:slots:1.000000 2734154:1:STARTING:1381144160:86460:Q:[email protected]:slots:1.000000 2734155:1:STARTING:1381144160:86460:Q:[email protected]:slots:1.000000 2734156:1:STARTING:1381144160:86460:Q:[email protected]:slots:1.000000 2734157:1:STARTING:1381144160:86460:Q:[email protected]:slots:1.000000 2734158:1:STARTING:1381144160:86460:Q:[email protected]:slots:1.000000 2734159:1:STARTING:1381144160:86460:Q:[email protected]:slots:1.000000 2734160:1:STARTING:1381144160:86460:Q:[email protected]:slots:1.000000 2734161:1:STARTING:1381144160:86460:Q:[email protected]:slots:1.000000 2735158:1:STARTING:1381144160:86460:Q:[email protected]:slots:1.000000 2735159:1:STARTING:1381144160:86460:Q:[email protected]:slots:1.000000 2735160:1:STARTING:1381144160:86460:Q:[email protected]:slots:1.000000 2735161:1:STARTING:1381144160:86460:Q:[email protected]:slots:1.000000 2735162:1:STARTING:1381144160:86460:Q:[email protected]:slots:1.000000 2735163:1:STARTING:1381144160:86460:Q:[email protected]:slots:1.000000 2735164:1:STARTING:1381144160:86460:Q:[email protected]:slots:1.000000 2735165:1:STARTING:1381144160:86460:Q:[email protected]:slots:1.000000 2735166:1:STARTING:1381144160:86460:Q:[email protected]:slots:1.000000 2735167:1:STARTING:1381144160:86460:Q:[email protected]:slots:1.000000 2735168:1:STARTING:1381144160:86460:Q:[email protected]:slots:1.000000 2735169:1:STARTING:1381144160:86460:Q:[email protected]:slots:1.000000 2735170:1:STARTING:1381144160:86460:Q:[email protected]:slots:1.000000 2735171:1:STARTING:1381144160:86460:Q:[email protected]:slots:1.000000 2735172:1:STARTING:1381144160:86460:Q:[email protected]:slots:1.000000 2735173:1:STARTING:1381144160:86460:Q:[email protected]:slots:1.000000 2735174:1:STARTING:1381144160:86460:Q:[email protected]:slots:1.000000 2735175:1:STARTING:1381144160:86460:Q:[email protected]:slots:1.000000 2735176:1:STARTING:1381144160:86460:Q:[email protected]:slots:1.000000 2735177:1:STARTING:1381144160:86460:Q:[email protected]:slots:1.000000 2735178:1:STARTING:1381144160:86460:Q:[email protected]:slots:1.000000 2735179:1:STARTING:1381144160:86460:Q:[email protected]:slots:1.000000 2735180:1:STARTING:1381144160:86460:Q:[email protected]:slots:1.000000 2735181:1:STARTING:1381144160:86460:Q:[email protected]:slots:1.000000 2735182:1:STARTING:1381144160:86460:Q:[email protected]:slots:1.000000 2735183:1:STARTING:1381144160:86460:Q:[email protected]:slots:1.000000 2735184:1:STARTING:1381144160:86460:Q:[email protected]:slots:1.000000 2735185:1:STARTING:1381144160:86460:Q:[email protected]:slots:1.000000 2735186:1:STARTING:1381144160:86460:Q:[email protected]:slots:1.000000 2735187:1:STARTING:1381144160:86460:Q:[email protected]:slots:1.000000 2735188:1:STARTING:1381144160:86460:Q:[email protected]:slots:1.000000 2735189:1:STARTING:1381144160:86460:Q:[email protected]:slots:1.000000 2735190:1:STARTING:1381144160:86460:Q:[email protected]:slots:1.000000 2735191:1:STARTING:1381144160:86460:Q:[email protected]:slots:1.000000 2735192:1:STARTING:1381144160:86460:Q:[email protected]:slots:1.000000 2735193:1:STARTING:1381144160:86460:Q:[email protected]:slots:1.000000 2743479:1:STARTING:1381144160:86460:Q:[email protected]:slots:1.000000 2743480:1:STARTING:1381144160:86460:Q:[email protected]:slots:1.000000 2743481:1:STARTING:1381144160:86460:Q:[email protected]:slots:1.000000 2743482:1:STARTING:1381144160:86460:Q:[email protected]:slots:1.000000 2743483:1:STARTING:1381144160:86460:Q:[email protected]:slots:1.000000 2743484:1:STARTING:1381144160:86460:Q:[email protected]:slots:1.000000 2743485:1:STARTING:1381144160:86460:Q:[email protected]:slots:1.000000 2743486:1:STARTING:1381144160:86460:Q:[email protected]:slots:1.000000 2743487:1:STARTING:1381144160:86460:Q:[email protected]:slots:1.000000 2743488:1:STARTING:1381144160:86460:Q:[email protected]:slots:1.000000 2743489:1:STARTING:1381144160:86460:Q:[email protected]:slots:1.000000 2743490:1:STARTING:1381144160:86460:Q:[email protected]:slots:1.000000 2743491:1:STARTING:1381144160:86460:Q:[email protected]:slots:1.000000 2743492:1:STARTING:1381144160:86460:Q:[email protected]:slots:1.000000 2743493:1:STARTING:1381144160:86460:Q:[email protected]:slots:1.000000 2743494:1:STARTING:1381144160:86460:Q:[email protected]:slots:1.000000 2743495:1:STARTING:1381144160:86460:Q:[email protected]:slots:1.000000 2743496:1:STARTING:1381144160:86460:Q:[email protected]:slots:1.000000 2743497:1:STARTING:1381144160:86460:Q:[email protected]:slots:1.000000 2743498:1:STARTING:1381144160:86460:Q:[email protected]:slots:1.000000 2743499:1:STARTING:1381144160:86460:Q:[email protected]:slots:1.000000 2743500:1:STARTING:1381144160:86460:Q:[email protected]:slots:1.000000 2743501:1:STARTING:1381144160:86460:Q:[email protected]:slots:1.000000 2743502:1:STARTING:1381144160:86460:Q:[email protected]:slots:1.000000 2743503:1:STARTING:1381144160:86460:Q:[email protected]:slots:1.000000 2743504:1:STARTING:1381144160:86460:Q:[email protected]:slots:1.000000 2743505:1:STARTING:1381144160:86460:Q:[email protected]:slots:1.000000 2743506:1:STARTING:1381144160:86460:Q:[email protected]:slots:1.000000 2743507:1:STARTING:1381144160:86460:Q:[email protected]:slots:1.000000 2743508:1:STARTING:1381144160:86460:Q:[email protected]:slots:1.000000 2743509:1:STARTING:1381144160:86460:Q:[email protected]:slots:1.000000 2743510:1:STARTING:1381144160:86460:Q:[email protected]:slots:1.000000 2743511:1:STARTING:1381144160:86460:Q:[email protected]:slots:1.000000 2743512:1:STARTING:1381144160:86460:Q:[email protected]:slots:1.000000 2743513:1:STARTING:1381144160:86460:Q:[email protected]:slots:1.000000 2743514:1:STARTING:1381144160:86460:Q:[email protected]:slots:1.000000 2743515:1:STARTING:1381144160:86460:Q:[email protected]:slots:1.000000 2743516:1:STARTING:1381144160:86460:Q:[email protected]:slots:1.000000 2743517:1:STARTING:1381144160:86460:Q:[email protected]:slots:1.000000 2743518:1:STARTING:1381144160:86460:Q:[email protected]:slots:1.000000


Even though jobs appear here listed as "starting" they are not running at all. But they are issuing a "starting" message on each scheduling interval.

Why are the reservations blocking a third of the cluster??? It shouldn't be a backfilling issue, they are blocking the usage of 3 times the slots reserved. Why the "starting" jobs cannot run?

Txema



El 07/10/13 09:28, Christian Krause escribió:
Hello,

We solved it the way that `h_rt` is set to FORCED in the complex list:

#name shortcut type relop requestable consumable default urgency
#------------------------------------------------------------------------------------------------
h_rt h_rt TIME <= FORCED YES 0:0:0 0

And have a JSV rejecting jobs that don't request it (because they would be pending indefinetely
unless you have a default duration or use qalter).

You could also use a JSV to enforce that only jobs with large resources (in your case more than some
amount of slots) are able to request reservation, i.e.:

     # pseudo JSV code
          SLOT_RESERVATION_THRESHOLD=...
          if slots < SLOT_RESERVATION_THRESHOLD then
         "disable reservation / reject"
     else
         "enable reservation"
     fi


On Fri, Oct 04, 2013 at 04:25:29PM +0200, Txema Heredia wrote:
Hi all,

I have a 27-node cluster. Currently there are 320 out of 320 slots
filled up. All by jobs requesting 1-slot.

At the top of my waiting queue there are 28 different jobs
requesting 3 to 12 cores using two different parallel environments.
All these jobs are requesting -R y. They are being ignored and
overrun by the myriad of 1-slot requesting jobs behind them in the
waiting queue.

I have enabled the scheduler logging. During the last 4 hours, it
has logged 724 new jobs starting, in all the 27 nodes. Not a single job on the system is requesting -l h_rt, but single-core jobs keep
being scheduled  and all the parallel jobs are starving.

As far as I understand, the backfilling is killing my reservations,
even if no one is requesting any kind of time, but if I set the
"default_duration" to INFINITY, all the RESERVING log messages
disappear.

Additionaly, for some odd reason, I only receive RESERVING messages from the jobs requesting a given number of slots (-pe whatever N). The jobs requesting a slot-range (-pe threaded 4-10) seem to reserve
nothing.

My scheduler configuration is as follows:

# qconf -ssconf
algorithm                         default
schedule_interval                 0:0:5
maxujobs                          0
queue_sort_method                 load
job_load_adjustments              np_load_avg=0.50
load_adjustment_decay_time        0:7:30
load_formula                      np_load_avg
schedd_job_info                   true
flush_submit_sec                  0
flush_finish_sec                  0
params                            MONITOR=1
reprioritize_interval             0:0:0
halftime                          168
usage_weight_list cpu=0.187000,mem=0.116000,io=0.697000
compensation_factor               5.000000
weight_user                       0.250000
weight_project                    0.250000
weight_department                 0.250000
weight_job                        0.250000
weight_tickets_functional         1000000000
weight_tickets_share              1000000000
share_override_tickets            TRUE
share_functional_shares           TRUE
max_functional_jobs_to_schedule   200
report_pjob_tickets               TRUE
max_pending_tasks_per_job         50
halflife_decay_list               none
policy_hierarchy                  OSF
weight_ticket                     0.010000
weight_waiting_time               0.000000
weight_deadline                   3600000.000000
weight_urgency                    0.100000
weight_priority                   1.000000
max_reservation                   50
default_duration                  24:00:00


I have also tested it with params PROFILE=1 and default_duration
INFINITY. But, when I set it, not a single reservation is logged in /opt/gridengine/default/common/schedule and new jobs keep starting.


What am I missing? Is it possible to kill the backfilling? Are my
reservations really working?

Thanks in advance,

Txema
_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users
_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users



_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Reply via email to