Are there any jobs in qstat? Have you tried rebooting the headnode
completely and then rebooting the nodes?
Thats probably the easiest way to "reset" things once they get into a funny
state. I don't have a maui/torque install to look at currently to get the
nicer way of resetting things.
Is the maui service running? Maui and torque are fairly tightly connected
and if one is down it makes the other act strangely.
On 10/9/07, Annette Sahores <[EMAIL PROTECTED]> wrote:
>
> Michael,
> I've installed 1 server and 6 nodes.
> All of them with Red Hat Enterprise Linux 4 Update 5
> and Oscar Version 5.0
> I can ping to every node and ssh to it and I am also able to run some
> fortran tests using mpirun.
> The output of pbsnodes -a is:
> node01.calculo.invap.com.ar
> state = down
> np = 4
> properties = all
> ntype = cluster
>
> node02.calculo.invap.com.ar
> state = down
> np = 4
> properties = all
> ntype = cluster
>
> node03.calculo.invap.com.ar
> state = down
> np = 4
> properties = all
> ntype = cluster
>
> node04.calculo.invap.com.ar
> state = down
> np = 4
> properties = all
> ntype = cluster
>
> node05.calculo.invap.com.ar
> state = down
> np = 4
> properties = all
> ntype = cluster
>
> node06.calculo.invap.com.ar
> state = down
> np = 4
> properties = all
> ntype = cluster
>
> I don't know how to definitely change the state of them. I tried using
> qmgr: set node node01.calculo.invap.com.ar state=free
> on each node and after a while the state becomes down again.
>
> here's the output of qmgr -c 'p s'
>
> #
> # Create queues and set their attributes.
> #
> #
> # Create and define queue workq
> #
> create queue workq
> set queue workq queue_type = Execution
> set queue workq resources_max.cput = 10000:00:00
> set queue workq resources_max.ncpus = 24
> set queue workq resources_max.nodect = 6
> set queue workq resources_max.walltime = 10000:00:00
> set queue workq resources_min.cput = 00:00:01
> set queue workq resources_min.ncpus = 1
> set queue workq resources_min.nodect = 1
> set queue workq resources_min.walltime = 00:00:01
> set queue workq resources_default.cput = 10000:00:00
> set queue workq resources_default.ncpus = 1
> set queue workq resources_default.nodect = 1
> set queue workq resources_default.walltime = 10000:00:00
> set queue workq resources_available.nodect = 6
> set queue workq enabled = True
> set queue workq started = True
> #
> # Set server attributes.
> #
> set server scheduling = True
> set server default_queue = workq
> set server log_events = 64
> set server mail_from = adm
> set server query_other_jobs = True
> set server resources_available.ncpus = 24
> set server resources_available.nodect = 6
> set server resources_available.nodes = 6
> set server resources_max.ncpus = 24
> set server resources_max.nodes = 6
> set server scheduler_iteration = 60
> set server node_check_rate = 150
> set server tcp_timeout = 6
> set server pbs_version = 2.0.0p8
>
>
> I don't know where to search for more information to figure out where's
> the problem.
>
> any ideas appreciated.
>
> Annette
>
>
> Date: Tue, 9 Oct 2007 11:53:29 -0400
> From: "Michael Edwards" <[EMAIL PROTECTED] <mailto:[EMAIL PROTECTED]>>
> Subject: Re: [Oscar-devel] No enough free nodes
>
> Did you install as many nodes as you chose when you defined the cluster?
>
> Can you ping all the nodes you think should be included in the cluster?
>
> What distro and OSCAR version are you using?
>
> -------------------------------------------------------------------------
> This SF.net email is sponsored by: Splunk Inc.
> Still grepping through log files to find problems? Stop.
> Now Search log events and configuration files using AJAX and a browser.
> Download your FREE copy of Splunk now >> http://get.splunk.com/
> _______________________________________________
> Oscar-devel mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/oscar-devel
>
-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems? Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >> http://get.splunk.com/
_______________________________________________
Oscar-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/oscar-devel