Am 01.02.2013 um 23:53 schrieb [email protected]: > > I'm looking for suggestions for dealing with h_vmem requirements for > multi-slot jobs. > > We use memory as a consumable and a required complex. > > I understand that SGE multiplies the h_vmem request by the number of slots > in order to determine the job memory requirement. > > In our environment, there are a processing pipelines that take parameters > to control the number of child processes launched by the job. > > For these jobs, the high-point in memory use is independent of the number > of child processes. > > For example, a job will begin with a single-threaded section that uses > 2GB of RAM, then launch "N" child processes that use 500MB each, then > finish with a section that assembles the results of the child processes > and requires 8GB. > > That example job would be submitted with the option "-l h_vmem=8G". > > Users are aware that they must give a "-pe threaded N" parameter to SGE > when they run the job with "N" child processes. > > Typically, users will run this type of job with either zero or between > 3~6 child processes. > > I've written a JSV to divide the user-supplied h_vmem value by the number of > slots, and reset h_vmem. This allows users to avoid recalculating memory > requirement whenever they submit a job with more than 1 child process. > > This causes a problem if the JSV-calculated h_vmem value is lower than > then actual memory use and SGE kills the job, for example, if the job > described above is submitted with: > > -pe threaded 6 -l h_vmem=8G > > the JSV readjusts h_vmem to 1.5G, and the job is killed with it tries > to use 8GB. > > Without the JSV, when users submit these jobs with a parameter to launch > multiple child processes (with the corresponding "-pe threaded" option), > SGE will set a higher-than-needed memory requirement (48GB in the above > example). This means that the job cannot be scheduled if it appears > to exceed the memory of our largest server. If the job can be run, > it usually will wait a long time for a machine sufficient memory to be > available and then it blocks other users from running jobs on the same > node because SGE treats memory as a consumable. > > Is there a way to tell SGE not to multiply the user-supplied h_vmem > request by the number of requested slots?
In addition to the other post: dividing should work too. For the jobscript itself the ulimit is set to slots on the master node of the parallel job multiplied by the memory request. This is not done for `qrsh` processes though. Hence for a plain threaded job where all is bound to the jobscript it should work it should work. Sometimes it's tricky or not possible in SGE to setup different features for different types of jobs and/or different scheduling policies. -- Reuti > Is there another parameter that could be changed within the JSV to preserve > the user-supplied h_vmem and prevent SGE from trying to require excessive > memory? > > Do you have any suggestions in terms of user training and education to > explain this situation so that they can submit single- or multi-slot > jobs with appropriate memory requests? > > Thanks, > > Mark > _______________________________________________ > users mailing list > [email protected] > https://gridengine.org/mailman/listinfo/users _______________________________________________ users mailing list [email protected] https://gridengine.org/mailman/listinfo/users
