Hi, Am 29.02.2012 um 18:07 schrieb Txema Heredia Genestar:
> I want to control the usage of the local disk of our execution nodes. As far > as I have found, the only related option offered by SGE is the h_fsize limit. > But that will not work because it just limits the maximum file size of any > created file in any filesystem, being it the local disk or the NFS shared > volume. > > What I came around is: > 1- Create a load sensor for the usage percentage of the local disk of each > host. > 2- Add that sensor to the Suspend Threshold of all queues. > 3- Create a consumable attribute "local_disk", with default value = 0KB (most > jobs won't make any use of it) > 4- Set the value of "local_disk" in each host > > That way, whenever a job is sent, if it requests no disk space, nothing > happens. If the job explicitly requests disk space, the job will be scheduled > to a host with enough free space. If that job exceeds the requested disk > space, "usually" nothing will happen. But if the job exceeds its disk space > in a node with several other jobs using that disk, instead of filling the > disk and crash the jobs due to lack of space, all jobs will be suspended > until the problem is manually fixed. > I understand that this is not a true resource limit as with h_vmem, and it > requires human conflict solving. > > Does anyone have a better idea? A load sensor is covered in: http://arc.liv.ac.uk/SGE/howto/loadsensor.html I use it for a load_threshold if tmpfree falls below 1 GB left in /tmp. In addition you can make tmpfree consumable and attach an initial value to each exechost which can be requested. > Thanks in advance, > > Txema > > PS: Another possible option i thought about would be a prolog script (and the > epilog cleanup equivalent) that, before the job starts: > 1- Creates a group for the jobid, and assigns the group to the user. > 2- Creates a group quota for the local disk with the requested local_disk > value And then terminates the job? > But that would be much more complicated and could add some unwanted > complexity to the whole system. Do you users stay in $TMPDIR? Then it would be easier I think to have a `du -s *.all.q` and check whether any is above the request. NB: There is a suspend_threshold for queues, but unfortunately not for each individual job on its own. === Another approach, if the jobs stay in one node: - in the job prolog create a file with the requested space - format and mount it on $TMPDIR as loop device - in the epilog it can be removed again Well, creating and formatting will take some time, but they can never pass the requested space and it's guaranteed to be available. -- Reuti > _______________________________________________ > users mailing list > [email protected] > https://gridengine.org/mailman/listinfo/users _______________________________________________ users mailing list [email protected] https://gridengine.org/mailman/listinfo/users
