Yes, you are right, the real issue is the way we use SGE in this case.
Actually, the contract with the user in this case is something like: "Just
get me a full host, and I'll make sure I use its resources".

I worked around this by restricting this queue to an uniform partition of
my cluster + a JSV with the h_vmem hardcoded.

Thanks for your kind advice !

2012/10/30 Reuti <[email protected]>

> Hi,
>
> Am 30.10.2012 um 21:56 schrieb Julien Nicoulaud:
>
> Am 29.10.2012 um 17:30 schrieb Julien Nicoulaud:
>>
>> > I have a special queue for exclusive host access using a forced boolean
>> complex + subordinate queues, as described here:
>> https://blogs.oracle.com/templedf/entry/exclusive_host_access_with_grid.
>> >
>> > Now I'm in the process of setting up forced memory reservation:
>> >       • Turned h_vmem into a consumable resource
>> >       • Set up a value on each exec host
>> > It works just fine except for the case of the exclusive queue: it makes
>> no sense getting exclusive access to a host and not being able to use all
>> its memory. Is there a way to:
>> >       • Somehow automatically set requested h_vmem to granted host
>> h_vmem
>> >       • Or even just exclude this queue from h_vmem checking
>> > Does anyone know a good "pattern" for dealing with this case ?
>>
>> you mean: if some requests exclusive access, to adjust h_vmem accordingly?
>>
> Yes, I want to automatically set the job h_vmem to the host max (as
> configured with qconf -me <host>).
>
>
>>
>> In principle a JSV (job submission verifier) could do. But for parallel
>> jobs it might depend on the actual allocation which is used during
>> scheduling what would be feasible. Are you also requesting e dedicated
>> amount of cores per machine? Are you executing more then one time `qrsh
>> -inherit` to  a slave node?
>>
>> Background for this question is, that on the master node of the parallel
>> job, the job script will get h_vmem multiplied by the granted slots on this
>> machine (as any h_vmem request is per slot). But for each `qrsh -inherit`
>> it will be granted only once. So it could be necessary to request the
>> number of machines instead and for each to request the full memory.
>>
> I do have some parallel jobs running in this queue, but no core binding,
> and no "qrsh -inherit".
>
> But anyway, before handling the case of parallel jobs, I took a dive into
> the JSV docs/samples, and I must say I'm quite confused on how you do that
> with a JSV. I can't see how one can get information about the "elected"
> host in the JSV, or am I missing something obvious ?
>
>
> No, I was referring to an uniform cluster and just to adjust:
>
> $ qsub -l excl foobar.sh
>
> to
>
> $ qsub -l excl,h_vmem=64G foobar.sh
>
> in case all have 64G. The JSV is used at submission time to adjust
> resource requests according to some policy of the admin.
>
> If I think about again with your heterogenous cluster: why adjust at all?
> You know your exclusive job will need 16GB if scheduled to a 16GB node. Now
> it's being scheduled to a 64GB exechost - as we know, that it is sufficient
> to have 16GB, there is no need to change it to 64GB.
>
> -- Reuti
>
_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Reply via email to