Thanks, that would help debug your cluster. And in general, if you
believe there can be issues related to security and don't want to
disclose the logs, just use a sed script to remove all traces of the
domain names, server names, and/or job names, and replace with
something like "node1", "job2"...

Lastly, you can use "qhost -j" to see which host is running jobs
together with the usage info, and "qstat -j <job id>" to see why a
specific job is not running.

I will look at the logs tonight and see if there is anything obviously wrong...

Rayson



On Wed, Mar 16, 2011 at 4:53 PM, Lane Schwartz <[email protected]> wrote:
> On Sat, Mar 12, 2011 at 3:08 PM, Rayson Ho <[email protected]> wrote:
>>
>> Can you post your "qhost" output, together with "qstat -j" output??
>
>
> AttachedĀ are theĀ results of running qhost, qstat -j, qstat, and the last
> 1000 lines of qmaster/messages following a call to qconf -tsm. I ran qstat,
> not qstat -u "*", so only my jobs are listed.
>
> I submitted 64 jobs, and very shortly thereafter about half of them moved
> into running state. Each job requested -l mem_free=3072M. As far as I can
> see there were slots with available resources free that should have accepted
> more of the jobs.The qstat log shows my jobs in the queue at that point.
>
> The call to qhost was made immediately prior to the call to qstat.
>
> The call to qstat -j shows no messages.
>
> Thanks,
> Lane
>

_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Reply via email to