Is fairshare not used for a reason?? Sounds to me that in your scenario it is only 1 account using most of the resources of the cluster.
See "Scheduler Policies for Job Prioritization in the Sun N1 Grid Engine 6 System" written by Charu Chaubal. http://confluence.rcs.griffith.edu.au:8080/download/attachments/5668957/Sge-SCHEDULER+POLICIES-guide.pdf Rayson On Thu, Apr 26, 2012 at 12:20 PM, Stuart Barkley <[email protected]> wrote: > Can someone give a quick overview of how job reservations are supposed > to work? > > I have a large cluster where one user has ~1500 jobs executing which > take several days each to run. Every day several of the jobs finish > and then new ones from the array job start. These jobs have a minimal > memory footprint (h_vmem=2G), so the new jobs fit exactly in the old > footprint. > > I also have another user array job, but these have h_vmem=20G. These > jobs have been starved for several days and none have started since > 20G never becomes free on a single host. > > h_vmem is consumable and generally works well preventing memory over > allocation. > > In this case the blocking job turnover is pretty slow and I suspect > fragmentation between nodes means that a single node is unlikely to > actually become empty within any short time period. > > I have manually adjusted the job priority so the starved job is at the > top of the waiting list (qalter -p 500). > > I've manually set the qalter "-R y" option, so the job should be > considered for a reservation. > > I have "max_reservation 8" is sched_conf so believe that an internal > jobs reservation should be done. > > Previous experimentation with job reservations on smaller turn around > jobs appeared to have an effect, at some point new it appeared that > grid engine was clearing off nodes for larger jobs, but I didn't find > any way to actually confirm that is what was happening. > > Is there any way to tell that grid engine has even noticed this and > created a reservation? Is there a way to see what future resources > have been "reserved"? > > Will the reservation adjust itself over time? The running jobs all > have a huge h_rt, but the actual run time varies a lot. It would be > good if grid engine would reallocate reservations as jobs end and > better possibilities emerge. > > Current manual workaround: > > For this specific case I have created a temporary rqs to limit this > specific user to only 1000 slots, but I need to be sure I reset this > once the current blocked jobs get started. > > (still using sge6.2u5, CentOS 5) > > Thanks, > Stuart Barkley > -- > I've never been lost; I was once bewildered for three days, but never lost! > -- Daniel Boone > _______________________________________________ > users mailing list > [email protected] > https://gridengine.org/mailman/listinfo/users _______________________________________________ users mailing list [email protected] https://gridengine.org/mailman/listinfo/users
