On 14/12/12 4:15 AM, "Dave Love" <[email protected]> wrote:

>Schmidt U. <[email protected]> writes:
>
>> h_vmem requestable and consumable could lead to other problems as we
>> had in our cluster with jobs of about 200 slots. There is an overhead
>> in h_vmem for the first node.
>> Lock at: 
>>http://gridengine.org/pipermail/users/2011-September/001636.html
>
>A while ago I posted a counterexample -- we still don't understand the
>trans-Pennine difference and I'd be interested if anyone has good ideas.
>If it's a problem, I reckon the first thing to try is tree-spawning the
>MPI processes, assuming your MPI supports it.

That could be interesting. We've noticed that whenever we involve our
MPI's and PE's in this mix, things do become far more complicated.
Examples include watching Grid engine bounce around trying to secure
enough nodes that are "clear" or have the space (slots) and memory
allocation available for a PE to "sit" nicely across several of them,
knowing full well there is enough resources to do it, with the user being
very courteous about specifying sane complex values, but ultimately
watching the PE "Waiting for resources".

>
>The next SGE release will account memory differently on Linux (when the
>information is available from /proc) to keep Mark happy, and it could be
>clever about what memory segments it includes.  The change is mixed up
>with other changes, and isn't pushed yet.
>
>>  When we introduced  h_vmem consumable, all the parallel jobs had to
>> define a  bigger h_vmem as necessary.  So we started to "waste" the
>> RAM of the nodes. The use of  #$ -l exclusive=true was salving the
>> problem a little bit, but increased waiting time for large jobs. Now
>> we gave up the use of h_vmem consumable and are using only
>> virtual_free.
>
>Good if it works, but in our case it would often cause jobs to fail, or
>thrash if we allowed the swapping.
>
>The ultimate computing slogan is "Your Mileage May Vary".  -- Richard
>O'Keefe

Indeed. Not that I'm ungrateful for a bit of software that is free and has
done so much good for us, for so long - but it is frustrating at times. I
guess this is just one of those complex problems to solve, that may not
have a sane solving rule (in the most simplistic computer science
computational complexity sense).

--JC


>
>-- 
>Community Grid Engine:  http://arc.liv.ac.uk/SGE/
>_______________________________________________
>users mailing list
>[email protected]
>https://gridengine.org/mailman/listinfo/users


_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Reply via email to