Thanks, that would help debug your cluster. And in general, if you believe there can be issues related to security and don't want to disclose the logs, just use a sed script to remove all traces of the domain names, server names, and/or job names, and replace with something like "node1", "job2"...
Lastly, you can use "qhost -j" to see which host is running jobs together with the usage info, and "qstat -j <job id>" to see why a specific job is not running. I will look at the logs tonight and see if there is anything obviously wrong... Rayson On Wed, Mar 16, 2011 at 4:53 PM, Lane Schwartz <[email protected]> wrote: > On Sat, Mar 12, 2011 at 3:08 PM, Rayson Ho <[email protected]> wrote: >> >> Can you post your "qhost" output, together with "qstat -j" output?? > > > AttachedĀ are theĀ results of running qhost, qstat -j, qstat, and the last > 1000 lines of qmaster/messages following a call to qconf -tsm. I ran qstat, > not qstat -u "*", so only my jobs are listed. > > I submitted 64 jobs, and very shortly thereafter about half of them moved > into running state. Each job requested -l mem_free=3072M. As far as I can > see there were slots with available resources free that should have accepted > more of the jobs.The qstat log shows my jobs in the queue at that point. > > The call to qhost was made immediately prior to the call to qstat. > > The call to qstat -j shows no messages. > > Thanks, > Lane > _______________________________________________ users mailing list [email protected] https://gridengine.org/mailman/listinfo/users
