"Gavin W. Burris" <[email protected]> writes: > On 09/11/2012 12:36 PM, Dave Love wrote: >> Of course there probably should be special provision for resources on >> the master host, and possibly excluding qrsh from the accounting. >> > > This must be a very common problem. I have wrestled with this myself > when crafting Open MPI jobs. Are there any work-arounds or options we > are missing? It is frustrating to block so much memory, multiplied out > by the number of slots, just for one process.
It sounds as if you need a better MPI; under those circumstances, I guess you'll have problems somewhat independently of SGE with very large jobs. Open-mpi versions I've used by default use one qrsh per slave, and presumably will use O(log slaves) with tree spawning -- definitely not O(slots). I've never bothered about this because it's in the noise here with 32-host jobs, which are the biggest we typically have at present, due to multiple fabrics and heterogeneous hosts. See the data I posted when this came up recently. Anyone who wants to work on it would be very welcome. -- Community Grid Engine: http://arc.liv.ac.uk/SGE/ _______________________________________________ users mailing list [email protected] https://gridengine.org/mailman/listinfo/users
