sinfo reports that both of the nodes in your gov_part partition are down.

Quoting Paul Thirumalai <[email protected]>:

The output of scontrol show part gov_part is the following

show part gov_part
PartitionName=gov_part
   AllocNodes=ALL AllowGroups=ALL Default=NO
   DefaultTime=NONE DisableRootJobs=NO Hidden=NO
   MaxNodes=1 MaxTime=UNLIMITED MinNodes=1
   Nodes=z[23,24]
   Priority=1 RootOnly=NO Shared=YES:4 PreemptMode=OFF
   State=UP TotalCPUs=8 TotalNodes=2


The output of sinfo is  /(Just showing relevant line)
gov_part     up   infinite      2   down z[23-24]


I dont think the partition is down

On Mon, May 30, 2011 at 5:58 AM, Pär Andersson <[email protected]> wrote:

Paul Thirumalai <[email protected]> writes:

> However the partition names gov_part is not working. When I submit a job
to
> that partition I get the following response
> /usr/bin/srun -N1 --partition=gov_part
> /home/kdsd03/oas/klurm/testing/test_time.py
> srun: Requested partition configuration not available now
> srun: job 24483 queued and waiting for resources
>
> After that when I hit Ctrl-C i get the following lines in stdout.
> srun: Force Terminated job 24483
> srun: Job allocation 24483 has been revoked.

You will get that message if the state of the partition is down. What is
the output of "scontrol show part gov_part" and "sinfo"?

/Pär Andersson
NSC







Reply via email to