Re: [gridengine users] Long delay starting jobs, even when compute nodes are empty

Rayson Ho Thu, 10 Mar 2011 13:28:23 -0800

Turn on "schedd_job_info", and run qstat -j to see why the scheduler
is not assigning jobs.


http://gridscheduler.sourceforge.net/htmlman/htmlman5/sched_conf.html
http://gridscheduler.sourceforge.net/howto/troubleshooting.html

Rayson



On Thu, Mar 10, 2011 at 2:04 PM, Lane Schwartz <[email protected]> wrote:
> Hi,
>
> Lately I've noticed that many of my jobs take much longer than
> expected (sometimes up to half an hour)  to go from pending to
> running, even when there are numerous nodes with sufficient resources
> available. Right now, for example, I've got a couple dozen jobs in
> pending, and 38 nodes where no jobs are running.
>
> I was wondering if anyone might be able to shed some light on why this
> might be. As I said, there are plenty of nodes with sufficient
> resources available to run the pending jobs, but they sometimes take a
> long time to go from pending to running.
>
> For reference, mem_free is set to consumable, and my jobs use the
> default value of 4GB for their requested mem_free. There are some
> other users' jobs which request more memory than that.
>
> The only clue I've been able to find is from examining the qmaster
> messages log file. It has lots of lines that look like the errors
> below:
>
> 03/10/2011 13:56:00|worker|t3n2|E|host load value "mem_free" exceeded:
> capacity is 66765959168.262146, job 495795 requests additional
> 68719476736.000000
> 03/10/2011 13:56:00|worker|t3n2|E|cannot start job 495795.1, as
> resources have changed during a scheduling run
> 03/10/2011 13:56:00|worker|t3n2|W|Skipping 108 remaining orders
> 03/10/2011 13:56:00|worker|t3n2|E|cannot start job 495795.1, as
> resources have changed during a scheduling run
>
> Any tips or pointers would be appreciated.
>
> Thanks,
> Lane
> _______________________________________________
> users mailing list
> [email protected]
> https://gridengine.org/mailman/listinfo/users
>

_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Re: [gridengine users] Long delay starting jobs, even when compute nodes are empty

Reply via email to