On 14/12/12 4:15 AM, "Dave Love" <[email protected]> wrote:
>Schmidt U. <[email protected]> writes: > >> h_vmem requestable and consumable could lead to other problems as we >> had in our cluster with jobs of about 200 slots. There is an overhead >> in h_vmem for the first node. >> Lock at: >>http://gridengine.org/pipermail/users/2011-September/001636.html > >A while ago I posted a counterexample -- we still don't understand the >trans-Pennine difference and I'd be interested if anyone has good ideas. >If it's a problem, I reckon the first thing to try is tree-spawning the >MPI processes, assuming your MPI supports it. That could be interesting. We've noticed that whenever we involve our MPI's and PE's in this mix, things do become far more complicated. Examples include watching Grid engine bounce around trying to secure enough nodes that are "clear" or have the space (slots) and memory allocation available for a PE to "sit" nicely across several of them, knowing full well there is enough resources to do it, with the user being very courteous about specifying sane complex values, but ultimately watching the PE "Waiting for resources". > >The next SGE release will account memory differently on Linux (when the >information is available from /proc) to keep Mark happy, and it could be >clever about what memory segments it includes. The change is mixed up >with other changes, and isn't pushed yet. > >> When we introduced h_vmem consumable, all the parallel jobs had to >> define a bigger h_vmem as necessary. So we started to "waste" the >> RAM of the nodes. The use of #$ -l exclusive=true was salving the >> problem a little bit, but increased waiting time for large jobs. Now >> we gave up the use of h_vmem consumable and are using only >> virtual_free. > >Good if it works, but in our case it would often cause jobs to fail, or >thrash if we allowed the swapping. > >The ultimate computing slogan is "Your Mileage May Vary". -- Richard >O'Keefe Indeed. Not that I'm ungrateful for a bit of software that is free and has done so much good for us, for so long - but it is frustrating at times. I guess this is just one of those complex problems to solve, that may not have a sane solving rule (in the most simplistic computer science computational complexity sense). --JC > >-- >Community Grid Engine: http://arc.liv.ac.uk/SGE/ >_______________________________________________ >users mailing list >[email protected] >https://gridengine.org/mailman/listinfo/users _______________________________________________ users mailing list [email protected] https://gridengine.org/mailman/listinfo/users
