HI,

Am 07.03.2013 um 19:24 schrieb Pablo Escobar:

> I have previously used load_sensors to disable a queue when one of my
> filesystems reach 98% of occupation but I don't know how to do the

What did you observe there? Usually it disables a queue instance but never a 
complete queue, hence it's already doing what you would like to have.

I assume as the threshold was just passing the limit for all queue instances at 
the same time, you just saw acting it on a global level - although each was 
disabled  on its own instead.


> same to disable a single exec node, not a full queue. In the man I see
> the "suspend_threshold" is only available for queues but not for exec
> nodes.

load_threshold is the entry you need. If this tests a BOOL variable which is 
set by the load_sensor, you can disable the desired exechost. No more jobs will 
be scheduled to a particular machine in alarm state then.

-- Reuti


> I would like to disable single exec nodes in case the node can't acces
> /home. Exactly what I am trying to achieve is to run a load_sensor in
> every exec node  just doing 'ls /home/username' and if this
> load_sensor returns a FALSE (can't access the filesystem) then just
> disable the node so it doesn't accept more jobs until the problem is
> solved.
> 
> is this possible using load_sensors or should try a different approach?
> 
> many thanks in advance for any help
> Pablo.
> _______________________________________________
> users mailing list
> [email protected]
> https://gridengine.org/mailman/listinfo/users


_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Reply via email to