Re: [gridengine users] Restricting / controlling the access to $TMPDIR

William Hay Wed, 29 Feb 2012 13:27:10 -0800

On 29 February 2012 17:47, Reuti <[email protected]> wrote:
> Hi,
>
> Am 29.02.2012 um 18:07 schrieb Txema Heredia Genestar:
>
>> I want to control the usage of the local disk of our execution nodes. As far 
>> as I have found, the only related option offered by SGE is the h_fsize 
>> limit. But that will not work because it just limits the maximum file size 
>> of any created file in any filesystem, being it the local disk or the NFS 
>> shared volume.
>>
>> What I came around is:
>> 1- Create a load sensor for the usage percentage of the local disk of each 
>> host.
>> 2- Add that sensor to the Suspend Threshold of all queues.
>> 3- Create a consumable attribute "local_disk", with default value = 0KB 
>> (most jobs won't make any use of it)
>> 4- Set the value of "local_disk" in each host
>>
>> That way, whenever a job is sent, if it requests no disk space, nothing 
>> happens. If the job explicitly requests disk space, the job will be 
>> scheduled to a host with enough free space. If that job exceeds the 
>> requested disk space, "usually" nothing will happen. But if the job exceeds 
>> its disk space in a node with several other jobs using that disk, instead of 
>> filling the disk and crash the jobs due to lack of space, all jobs will be 
>> suspended until the problem is manually fixed.
>> I understand that this is not a true resource limit as with h_vmem, and it 
>> requires human conflict solving.
>>
>> Does anyone have a better idea?
>
> A load sensor is covered in:
>
> http://arc.liv.ac.uk/SGE/howto/loadsensor.html
>
> I use it for a load_threshold if tmpfree falls below 1 GB left in /tmp.
>
> In addition you can make tmpfree consumable and attach an initial value to 
> each exechost which can be requested.
>
>
>> Thanks in advance,
>>
>> Txema
>>
>> PS: Another possible option i thought about would be a prolog script (and 
>> the epilog cleanup equivalent) that, before the job starts:
>> 1- Creates a group for the jobid, and assigns the group to the user.
>> 2- Creates a group quota for the local disk with the requested local_disk 
>> value
>
> And then terminates the job?
>
>
>> But that would be much more complicated and could add some unwanted 
>> complexity to the whole system.
>
> Do you users stay in $TMPDIR? Then it would be easier I think to have a `du 
> -s *.all.q` and check whether any is above the request.
>
> NB: There is a suspend_threshold for queues, but unfortunately not for each 
> individual job on its own.
>
> ===
>
> Another approach, if the jobs stay in one node:
>
> - in the job prolog create a file with the requested space
> - format and mount it on $TMPDIR as loop device
> - in the epilog it can be removed again
>
> Well, creating and formatting will take some time, but they can never pass 
> the requested space and it's guaranteed to be available.
>


Rather than creating a file you could mount an appropriately sized
tmpfs assuming most of your disk is already formated as swap.

William

_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Re: [gridengine users] Restricting / controlling the access to $TMPDIR

Reply via email to