Thank you for your kindly reply

Right now no user can run any job on the server. And I can ssh any
nodes without password. I tried to reboot all the cluster when the
server is running, but it doesn't seem to work.

Here is some debug info as you suggested, I hope this helps :)

# checkjob 21208

checking job 21208

State: Idle  (User: vuser  Group: relab)
WallTime: 0:00:00 of   INFINITY
SubmitTime: Wed Mar 19 09:47:04
  (Time Queued  Total: 0:02:48  Eligible: 0:00:00)

Total Tasks: 1

Req[0]  TaskCount: 1  Partition: ALL
Network: [NONE]  Memory >= 0  Disk >= 0  Swap NC 0
Opsys: [NONE]  Arch: [NONE]  Class: [workq 1]  Features: [nm][fast]


IWD: [NONE]  Executable:  [NONE]
QOS: DEFAULT  Bypass: 0  StartCount: 0
PartitionMask: [ALL]
Flags:       RESTARTABLE

job is deferred.  Reason:  NoResources  (exceeds available partition procs)
Holds:    Defer
PE:  1.00  StartPriority:  2
cannot select job 21208 for partition DEFAULT (job hold active)


# qstat -an

parellel:
                                                            Req'd  Req'd   Elap
Job ID          Username Queue    Jobname    SessID NDS TSK Memory Time  S Time
--------------- -------- -------- ---------- ------ --- --- ------ ----- - -----
21208.parellel  vuser    workq    bdimer        --    1   1    --  10000 Q   --
    --

# qstat -f
Job Id: 21208.parellel
    Job_Name = bdimer
    Job_Owner = [EMAIL PROTECTED]
    job_state = Q
    queue = workq
    server = parellel
    Checkpoint = u
    ctime = Wed Mar 19 09:47:04 2008
    Error_Path = parellel:/home/vuser/pdg/2007fall/1124/bdimer.err
    Hold_Types = n
    Join_Path = n
    Keep_Files = n
    Mail_Points = a
    mtime = Wed Mar 19 09:47:04 2008
    Output_Path = parellel:/home/vuser/pdg/2007fall/1124/bdimer.out
    Priority = 0
    qtime = Wed Mar 19 09:47:04 2008
    Rerunable = True
    Resource_List.cput = 10000:00:00
    Resource_List.ncpus = 1
    Resource_List.nodect = 1
    Resource_List.nodes = 1:fast:nm
    Resource_List.walltime = 10000:00:00
    Shell_Path_List = /bin/csh
    Variable_List = PBS_O_HOME=/home/vuser,PBS_O_LANG=en_US.UTF-8,
        PBS_O_LOGNAME=vuser,
        PBS_O_PATH=/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bi
        n:/usr/X11R6/bin:/opt/env-switcher/bin:/opt/mpich-1.2.5.10-ch_p4-gcc/bi
        n:/opt/hdf5-oscar-1.6.0/bin/:/opt/kernel_picker/bin:/opt/pvm3/lib:/opt/
        pvm3/lib/LINUX:/opt/pvm3/bin/LINUX:/opt/c3-4/:/opt/pbs/bin:/opt/pbs/lib
        /xpbs/bin:/home/chemtools/g03.c02/g03/bsd:/home/chemtools/g03.c02/g03/p
        rivate:/home/chemtools/g03.c02/g03:/usr/pgi/linux86/bin:/usr/pgi/linux8
        6/lib:/usr/pgi/linux86/include:/opt/maui/bin:/root/bin,
        PBS_O_MAIL=/var/spool/mail/root,PBS_O_SHELL=/bin/bash,
        PBS_O_HOST=parellel,PBS_O_WORKDIR=/home/vuser/pdg/2007fall/1124,
        PBS_O_QUEUE=workq
    etime = Wed Mar 19 09:47:04 2008



On Wed, Mar 19, 2008 at 3:09 AM,
<[EMAIL PROTECTED]> wrote:
> Send Oscar-users mailing list submissions to
>         oscar-users@lists.sourceforge.net
>
>  To subscribe or unsubscribe via the World Wide Web, visit
>         https://lists.sourceforge.net/lists/listinfo/oscar-users
>  or, via email, send a message with subject or body 'help' to
>         [EMAIL PROTECTED]
>
>  You can reach the person managing the list at
>         [EMAIL PROTECTED]
>
>  When replying, please edit your Subject line so it is more specific
>  than "Re: Contents of Oscar-users digest..."
>
>
>  Today's Topics:
>
>    1. How to make job run? (Chunwei Han)
>    2. Re: How to make job run? (Michael Edwards)
>    3. Re: How to make job run? (Greenseid, Joseph M.)
>    4. Re: How to make job run? (arun shankar)
>
>
>  ----------------------------------------------------------------------
>
>  Message: 1
>  Date: Tue, 18 Mar 2008 11:10:37 +0800
>  From: "Chunwei Han" <[EMAIL PROTECTED]>
>  Subject: [Oscar-users] How to make job run?
>  To: oscar-users@lists.sourceforge.net
>  Message-ID:
>         <[EMAIL PROTECTED]>
>  Content-Type: text/plain; charset=ISO-8859-1
>
>  Hey guys
>
>  Recently I take over the job as the cluster administrator. Unluckily
>  the former administrator did not leave any documents and I have no
>  experience at all. Right now the job can not be submitted. Here is
>  some debugger info:
>
>  # qstat
>  Job id           Name             User               Time Use S Queue
>  ---------------- ---------------- ------------------ -------- - -----
>  21205.parellel   bdimer           vuser                   0 Q workq
>  21207.parellel   gram0            vuser                   0 Q workq
>
>  # tracejob -n 10 21205
>
>  Job: 21205.parellel
>
>  03/17/2008 22:23:13  S    Job Queued at request of [EMAIL PROTECTED],
>  owner = [EMAIL PROTECTED], job name = bdimer, queue = workq
>  03/17/2008 22:23:13  A    queue=workq
>
>
>  So, how to make it run?
>  The version of OSCAR is 3.0
>
>
>
>  ------------------------------
>
>  Message: 2
>  Date: Tue, 18 Mar 2008 07:25:30 -0400
>  From: "Michael Edwards" <[EMAIL PROTECTED]>
>  Subject: Re: [Oscar-users] How to make job run?
>  To: oscar-users@lists.sourceforge.net
>  Message-ID:
>         <[EMAIL PROTECTED]>
>  Content-Type: text/plain; charset=ISO-8859-1
>
>  It could be any number of things.  Are other users able to run jobs on
>  the cluster?
>  When you started the cluster, did you boot the head node completely
>  before booting the cluster nodes?
>
>  Other than that, it would probably be an issue with either your user
>  permissions (can you ssh to a node without entering a password?) or
>  your queue script.  If you wanted to send your queue script it might
>  help.
>
>  On Mon, Mar 17, 2008 at 11:10 PM, Chunwei Han <[EMAIL PROTECTED]> wrote:
>  > Hey guys
>  >
>  >  Recently I take over the job as the cluster administrator. Unluckily
>  >  the former administrator did not leave any documents and I have no
>  >  experience at all. Right now the job can not be submitted. Here is
>  >  some debugger info:
>  >
>  >  # qstat
>  >  Job id           Name             User               Time Use S Queue
>  >  ---------------- ---------------- ------------------ -------- - -----
>  >  21205.parellel   bdimer           vuser                   0 Q workq
>  >  21207.parellel   gram0            vuser                   0 Q workq
>  >
>  >  # tracejob -n 10 21205
>  >
>  >  Job: 21205.parellel
>  >
>  >  03/17/2008 22:23:13  S    Job Queued at request of [EMAIL PROTECTED],
>  >  owner = [EMAIL PROTECTED], job name = bdimer, queue = workq
>  >  03/17/2008 22:23:13  A    queue=workq
>  >
>  >
>  >  So, how to make it run?
>  >  The version of OSCAR is 3.0
>  >
>  >  -------------------------------------------------------------------------
>  >  This SF.net email is sponsored by: Microsoft
>  >  Defy all challenges. Microsoft(R) Visual Studio 2008.
>  >  http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
>  >  _______________________________________________
>  >  Oscar-users mailing list
>  >  Oscar-users@lists.sourceforge.net
>  >  https://lists.sourceforge.net/lists/listinfo/oscar-users
>  >
>
>
>
>  ------------------------------
>
>  Message: 3
>  Date: Tue, 18 Mar 2008 08:10:30 -0500
>  From: "Greenseid, Joseph M." <[EMAIL PROTECTED]>
>  Subject: Re: [Oscar-users] How to make job run?
>  To: <oscar-users@lists.sourceforge.net>,
>         <oscar-users@lists.sourceforge.net>
>  Message-ID:
>         <[EMAIL PROTECTED]>
>  Content-Type: text/plain;       charset="iso-8859-1"
>
>  If you're using Maui for scheduling, try running the command `sudo checkjob 
> [job-id]` (or run as root if you don't have sudo set up).  This gives more 
> info than tracejob, and may tell you something a little more helpful, like 
> "not enough available resources," etc.
>
>  --Joe
>
>  ________________________________
>
>  From: [EMAIL PROTECTED] on behalf of Chunwei Han
>  Sent: Mon 3/17/2008 11:10 PM
>  To: oscar-users@lists.sourceforge.net
>  Subject: [Oscar-users] How to make job run?
>
>
>
>  Hey guys
>
>  Recently I take over the job as the cluster administrator. Unluckily
>  the former administrator did not leave any documents and I have no
>  experience at all. Right now the job can not be submitted. Here is
>  some debugger info:
>
>  # qstat
>  Job id           Name             User               Time Use S Queue
>  ---------------- ---------------- ------------------ -------- - -----
>  21205.parellel   bdimer           vuser                   0 Q workq
>  21207.parellel   gram0            vuser                   0 Q workq
>
>  # tracejob -n 10 21205
>
>  Job: 21205.parellel
>
>  03/17/2008 22:23:13  S    Job Queued at request of [EMAIL PROTECTED],
>  owner = [EMAIL PROTECTED], job name = bdimer, queue = workq
>  03/17/2008 22:23:13  A    queue=workq
>
>
>  So, how to make it run?
>  The version of OSCAR is 3.0
>
>  -------------------------------------------------------------------------
>  This SF.net email is sponsored by: Microsoft
>  Defy all challenges. Microsoft(R) Visual Studio 2008.
>  http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
>  _______________________________________________
>  Oscar-users mailing list
>  Oscar-users@lists.sourceforge.net
>  https://lists.sourceforge.net/lists/listinfo/oscar-users
>
>
>
>
>
>  ------------------------------
>
>  Message: 4
>  Date: Tue, 18 Mar 2008 07:48:13 -0700 (PDT)
>  From: arun shankar <[EMAIL PROTECTED]>
>  Subject: Re: [Oscar-users] How to make job run?
>  To: oscar-users@lists.sourceforge.net
>  Message-ID: <[EMAIL PROTECTED]>
>  Content-Type: text/plain; charset="us-ascii"
>
>  Try using the command ''qstat -atn'', which will give you more debugging 
> information ( extra comment line ), like ''Not Enough Resources Available'' 
> or ''License not vaild''. Also try using the command ''qstat -f  <jobid>'' to 
> get more info.
>
>  If the comment says ''license not valid'', check for the license file ( i 
> guess it should be under <HOME directory>/server_priv/license_file if i am 
> not wrong )
>
>  Check for the Queue Settings, # qmgr ( Enter )...Check for ''s q workq 
> started=true and s q workq enabled=true'', if neither of this is false, make 
> it true.
>
>  Above are some steps you can perform if jobs go into queue state. Hope this 
> will be useful.
>
>  Regards
>  Arun
>
>
>
>  ----- Original Message ----
>  From: "Greenseid, Joseph M." <[EMAIL PROTECTED]>
>  To: oscar-users@lists.sourceforge.net; oscar-users@lists.sourceforge.net
>  Sent: Tuesday, March 18, 2008 9:10:30 PM
>  Subject: Re: [Oscar-users] How to make job run?
>
>  If you're using Maui for scheduling, try running the command `sudo checkjob 
> [job-id]` (or run as root if you don't have sudo set up).  This gives more 
> info than tracejob, and may tell you something a little more helpful, like 
> "not enough available resources," etc.
>
>  --Joe
>
>  ________________________________
>
>  From: [EMAIL PROTECTED] on behalf of Chunwei Han
>  Sent: Mon 3/17/2008 11:10 PM
>  To: oscar-users@lists.sourceforge.net
>  Subject: [Oscar-users] How to make job run?
>
>
>
>  Hey guys
>
>  Recently I take over the job as the cluster administrator. Unluckily
>  the former administrator did not leave any documents and I have no
>  experience at all. Right now the job can not be submitted. Here is
>  some debugger info:
>
>  # qstat
>  Job id          Name            User              Time Use S Queue
>  ---------------- ---------------- ------------------ -------- - -----
>  21205.parellel  bdimer          vuser                  0 Q workq
>  21207.parellel  gram0            vuser                  0 Q workq
>
>  # tracejob -n 10 21205
>
>  Job: 21205.parellel
>
>  03/17/2008 22:23:13  S    Job Queued at request of [EMAIL PROTECTED],
>  owner = [EMAIL PROTECTED], job name = bdimer, queue = workq
>  03/17/2008 22:23:13  A    queue=workq
>
>
>  So, how to make it run?
>  The version of OSCAR is 3.0
>
>  -------------------------------------------------------------------------
>  This SF.net email is sponsored by: Microsoft
>  Defy all challenges. Microsoft(R) Visual Studio 2008.
>  http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
>  _______________________________________________
>  Oscar-users mailing list
>  Oscar-users@lists.sourceforge.net
>  https://lists.sourceforge.net/lists/listinfo/oscar-users
>
>
>
>  -------------------------------------------------------------------------
>  This SF.net email is sponsored by: Microsoft
>  Defy all challenges. Microsoft(R) Visual Studio 2008.
>  http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
>  _______________________________________________
>  Oscar-users mailing list
>  Oscar-users@lists.sourceforge.net
>  https://lists.sourceforge.net/lists/listinfo/oscar-users
>
>
>       
> ____________________________________________________________________________________
>  Be a better friend, newshound, and
>  know-it-all with Yahoo! Mobile.  Try it now.  
> http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ
>  -------------- next part --------------
>  An HTML attachment was scrubbed...
>
>  ------------------------------
>
>  -------------------------------------------------------------------------
>  This SF.net email is sponsored by: Microsoft
>  Defy all challenges. Microsoft(R) Visual Studio 2008.
>  http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
>
>  ------------------------------
>
>  _______________________________________________
>  Oscar-users mailing list
>  Oscar-users@lists.sourceforge.net
>  https://lists.sourceforge.net/lists/listinfo/oscar-users
>
>
>  End of Oscar-users Digest, Vol 22, Issue 7
>  ******************************************
>

-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
Oscar-users mailing list
Oscar-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/oscar-users

Reply via email to