Am 29.02.2012 um 18:47 schrieb Reuti:

> Hi,
> 
> Am 29.02.2012 um 18:07 schrieb Txema Heredia Genestar:
> 
>> I want to control the usage of the local disk of our execution nodes. As far 
>> as I have found, the only related option offered by SGE is the h_fsize 
>> limit. But that will not work because it just limits the maximum file size 
>> of any created file in any filesystem, being it the local disk or the NFS 
>> shared volume.
>> 
>> What I came around is:
>> 1- Create a load sensor for the usage percentage of the local disk of each 
>> host.
>> 2- Add that sensor to the Suspend Threshold of all queues.
>> 3- Create a consumable attribute "local_disk", with default value = 0KB 
>> (most jobs won't make any use of it)
>> 4- Set the value of "local_disk" in each host
>> 
>> That way, whenever a job is sent, if it requests no disk space, nothing 
>> happens. If the job explicitly requests disk space, the job will be 
>> scheduled to a host with enough free space. If that job exceeds the 
>> requested disk space, "usually" nothing will happen. But if the job exceeds 
>> its disk space in a node with several other jobs using that disk, instead of 
>> filling the disk and crash the jobs due to lack of space, all jobs will be 
>> suspended until the problem is manually fixed.
>> I understand that this is not a true resource limit as with h_vmem, and it 
>> requires human conflict solving.
>> 
>> Does anyone have a better idea?
> 
> A load sensor is covered in:
> 
> http://arc.liv.ac.uk/SGE/howto/loadsensor.html
> 
> I use it for a load_threshold if tmpfree falls below 1 GB left in /tmp.
> 
> In addition you can make tmpfree consumable and attach an initial value to 
> each exechost which can be requested. 
> 
> 
>> Thanks in advance,
>> 
>> Txema
>> 
>> PS: Another possible option i thought about would be a prolog script (and 
>> the epilog cleanup equivalent) that, before the job starts:
>> 1- Creates a group for the jobid, and assigns the group to the user.
>> 2- Creates a group quota for the local disk with the requested local_disk 
>> value

Aha, I found this:

http://arc.liv.ac.uk/pipermail/gridengine-users/2006-November/012125.html

as the group is already there as Rayson mentions, creating the quota is the 
easiest.

-- Reuti


> And then terminates the job?
> 
> 
>> But that would be much more complicated and could add some unwanted 
>> complexity to the whole system.
> 
> Do you users stay in $TMPDIR? Then it would be easier I think to have a `du 
> -s *.all.q` and check whether any is above the request.
> 
> NB: There is a suspend_threshold for queues, but unfortunately not for each 
> individual job on its own.
> 
> ===
> 
> Another approach, if the jobs stay in one node:
> 
> - in the job prolog create a file with the requested space
> - format and mount it on $TMPDIR as loop device
> - in the epilog it can be removed again
> 
> Well, creating and formatting will take some time, but they can never pass 
> the requested space and it's guaranteed to be available.
> 
> -- Reuti
> 
>> _______________________________________________
>> users mailing list
>> [email protected]
>> https://gridengine.org/mailman/listinfo/users
> 
> 
> _______________________________________________
> users mailing list
> [email protected]
> https://gridengine.org/mailman/listinfo/users


_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Reply via email to