Am 11.03.2011 um 14:49 schrieb Lane Schwartz: > Rayson, > > Thanks for the pointer. In the qmon scheduler configuration, I have > "Job Scheduling Information" set to true. I assume that's the same > setting you're refering to? > > With this setting enabled, I still don't get very much info. When I > run qstat -j on my jobs, the only thing it tells me is that a queue > instance for a particular node is dropped because that node is > disabled.
Because "disabled"? Did someone use `qmon` to disable the node or set up any calendar? -- Reuti > Thanks, > Lane > > On Thu, Mar 10, 2011 at 4:28 PM, Rayson Ho <[email protected]> wrote: >> Turn on "schedd_job_info", and run qstat -j to see why the scheduler >> is not assigning jobs. >> >> http://gridscheduler.sourceforge.net/htmlman/htmlman5/sched_conf.html >> http://gridscheduler.sourceforge.net/howto/troubleshooting.html >> >> Rayson >> >> >> >> On Thu, Mar 10, 2011 at 2:04 PM, Lane Schwartz <[email protected]> wrote: >>> Hi, >>> >>> Lately I've noticed that many of my jobs take much longer than >>> expected (sometimes up to half an hour) to go from pending to >>> running, even when there are numerous nodes with sufficient resources >>> available. Right now, for example, I've got a couple dozen jobs in >>> pending, and 38 nodes where no jobs are running. >>> >>> I was wondering if anyone might be able to shed some light on why this >>> might be. As I said, there are plenty of nodes with sufficient >>> resources available to run the pending jobs, but they sometimes take a >>> long time to go from pending to running. >>> >>> For reference, mem_free is set to consumable, and my jobs use the >>> default value of 4GB for their requested mem_free. There are some >>> other users' jobs which request more memory than that. >>> >>> The only clue I've been able to find is from examining the qmaster >>> messages log file. It has lots of lines that look like the errors >>> below: >>> >>> 03/10/2011 13:56:00|worker|t3n2|E|host load value "mem_free" exceeded: >>> capacity is 66765959168.262146, job 495795 requests additional >>> 68719476736.000000 >>> 03/10/2011 13:56:00|worker|t3n2|E|cannot start job 495795.1, as >>> resources have changed during a scheduling run >>> 03/10/2011 13:56:00|worker|t3n2|W|Skipping 108 remaining orders >>> 03/10/2011 13:56:00|worker|t3n2|E|cannot start job 495795.1, as >>> resources have changed during a scheduling run >>> >>> Any tips or pointers would be appreciated. >>> >>> Thanks, >>> Lane >>> _______________________________________________ >>> users mailing list >>> [email protected] >>> https://gridengine.org/mailman/listinfo/users >>> >> > > > > -- > When a place gets crowded enough to require ID's, social collapse is not > far away. It is time to go elsewhere. The best thing about space travel > is that it made it possible to go elsewhere. > -- R.A. Heinlein, "Time Enough For Love" > > _______________________________________________ > users mailing list > [email protected] > https://gridengine.org/mailman/listinfo/users > _______________________________________________ users mailing list [email protected] https://gridengine.org/mailman/listinfo/users
