Am 11.03.2011 um 14:58 schrieb Reuti: > Am 11.03.2011 um 14:49 schrieb Lane Schwartz: > >> Rayson, >> >> Thanks for the pointer. In the qmon scheduler configuration, I have >> "Job Scheduling Information" set to true. I assume that's the same >> setting you're refering to? >> >> With this setting enabled, I still don't get very much info. When I >> run qstat -j on my jobs, the only thing it tells me is that a queue >> instance for a particular node is dropped because that node is >> disabled. > > Because "disabled"? Did someone use `qmon` to disable the node or set up any > calendar?
Should read `qmod`- but both can be used. -- Reuti > > -- Reuti > > >> Thanks, >> Lane >> >> On Thu, Mar 10, 2011 at 4:28 PM, Rayson Ho <[email protected]> wrote: >>> Turn on "schedd_job_info", and run qstat -j to see why the scheduler >>> is not assigning jobs. >>> >>> http://gridscheduler.sourceforge.net/htmlman/htmlman5/sched_conf.html >>> http://gridscheduler.sourceforge.net/howto/troubleshooting.html >>> >>> Rayson >>> >>> >>> >>> On Thu, Mar 10, 2011 at 2:04 PM, Lane Schwartz <[email protected]> wrote: >>>> Hi, >>>> >>>> Lately I've noticed that many of my jobs take much longer than >>>> expected (sometimes up to half an hour) to go from pending to >>>> running, even when there are numerous nodes with sufficient resources >>>> available. Right now, for example, I've got a couple dozen jobs in >>>> pending, and 38 nodes where no jobs are running. >>>> >>>> I was wondering if anyone might be able to shed some light on why this >>>> might be. As I said, there are plenty of nodes with sufficient >>>> resources available to run the pending jobs, but they sometimes take a >>>> long time to go from pending to running. >>>> >>>> For reference, mem_free is set to consumable, and my jobs use the >>>> default value of 4GB for their requested mem_free. There are some >>>> other users' jobs which request more memory than that. >>>> >>>> The only clue I've been able to find is from examining the qmaster >>>> messages log file. It has lots of lines that look like the errors >>>> below: >>>> >>>> 03/10/2011 13:56:00|worker|t3n2|E|host load value "mem_free" exceeded: >>>> capacity is 66765959168.262146, job 495795 requests additional >>>> 68719476736.000000 >>>> 03/10/2011 13:56:00|worker|t3n2|E|cannot start job 495795.1, as >>>> resources have changed during a scheduling run >>>> 03/10/2011 13:56:00|worker|t3n2|W|Skipping 108 remaining orders >>>> 03/10/2011 13:56:00|worker|t3n2|E|cannot start job 495795.1, as >>>> resources have changed during a scheduling run >>>> >>>> Any tips or pointers would be appreciated. >>>> >>>> Thanks, >>>> Lane >>>> _______________________________________________ >>>> users mailing list >>>> [email protected] >>>> https://gridengine.org/mailman/listinfo/users >>>> >>> >> >> >> >> -- >> When a place gets crowded enough to require ID's, social collapse is not >> far away. It is time to go elsewhere. The best thing about space travel >> is that it made it possible to go elsewhere. >> -- R.A. Heinlein, "Time Enough For Love" >> >> _______________________________________________ >> users mailing list >> [email protected] >> https://gridengine.org/mailman/listinfo/users >> > > > _______________________________________________ > users mailing list > [email protected] > https://gridengine.org/mailman/listinfo/users _______________________________________________ users mailing list [email protected] https://gridengine.org/mailman/listinfo/users
