Yes, you are right, the real issue is the way we use SGE in this case. Actually, the contract with the user in this case is something like: "Just get me a full host, and I'll make sure I use its resources".
I worked around this by restricting this queue to an uniform partition of my cluster + a JSV with the h_vmem hardcoded. Thanks for your kind advice ! 2012/10/30 Reuti <[email protected]> > Hi, > > Am 30.10.2012 um 21:56 schrieb Julien Nicoulaud: > > Am 29.10.2012 um 17:30 schrieb Julien Nicoulaud: >> >> > I have a special queue for exclusive host access using a forced boolean >> complex + subordinate queues, as described here: >> https://blogs.oracle.com/templedf/entry/exclusive_host_access_with_grid. >> > >> > Now I'm in the process of setting up forced memory reservation: >> > • Turned h_vmem into a consumable resource >> > • Set up a value on each exec host >> > It works just fine except for the case of the exclusive queue: it makes >> no sense getting exclusive access to a host and not being able to use all >> its memory. Is there a way to: >> > • Somehow automatically set requested h_vmem to granted host >> h_vmem >> > • Or even just exclude this queue from h_vmem checking >> > Does anyone know a good "pattern" for dealing with this case ? >> >> you mean: if some requests exclusive access, to adjust h_vmem accordingly? >> > Yes, I want to automatically set the job h_vmem to the host max (as > configured with qconf -me <host>). > > >> >> In principle a JSV (job submission verifier) could do. But for parallel >> jobs it might depend on the actual allocation which is used during >> scheduling what would be feasible. Are you also requesting e dedicated >> amount of cores per machine? Are you executing more then one time `qrsh >> -inherit` to a slave node? >> >> Background for this question is, that on the master node of the parallel >> job, the job script will get h_vmem multiplied by the granted slots on this >> machine (as any h_vmem request is per slot). But for each `qrsh -inherit` >> it will be granted only once. So it could be necessary to request the >> number of machines instead and for each to request the full memory. >> > I do have some parallel jobs running in this queue, but no core binding, > and no "qrsh -inherit". > > But anyway, before handling the case of parallel jobs, I took a dive into > the JSV docs/samples, and I must say I'm quite confused on how you do that > with a JSV. I can't see how one can get information about the "elected" > host in the JSV, or am I missing something obvious ? > > > No, I was referring to an uniform cluster and just to adjust: > > $ qsub -l excl foobar.sh > > to > > $ qsub -l excl,h_vmem=64G foobar.sh > > in case all have 64G. The JSV is used at submission time to adjust > resource requests according to some policy of the admin. > > If I think about again with your heterogenous cluster: why adjust at all? > You know your exclusive job will need 16GB if scheduled to a 16GB node. Now > it's being scheduled to a 64GB exechost - as we know, that it is sufficient > to have 16GB, there is no need to change it to 64GB. > > -- Reuti >
_______________________________________________ users mailing list [email protected] https://gridengine.org/mailman/listinfo/users
