Thank you for your kindly reply Right now no user can run any job on the server. And I can ssh any nodes without password. I tried to reboot all the cluster when the server is running, but it doesn't seem to work.
Here is some debug info as you suggested, I hope this helps :) # checkjob 21208 checking job 21208 State: Idle (User: vuser Group: relab) WallTime: 0:00:00 of INFINITY SubmitTime: Wed Mar 19 09:47:04 (Time Queued Total: 0:02:48 Eligible: 0:00:00) Total Tasks: 1 Req[0] TaskCount: 1 Partition: ALL Network: [NONE] Memory >= 0 Disk >= 0 Swap NC 0 Opsys: [NONE] Arch: [NONE] Class: [workq 1] Features: [nm][fast] IWD: [NONE] Executable: [NONE] QOS: DEFAULT Bypass: 0 StartCount: 0 PartitionMask: [ALL] Flags: RESTARTABLE job is deferred. Reason: NoResources (exceeds available partition procs) Holds: Defer PE: 1.00 StartPriority: 2 cannot select job 21208 for partition DEFAULT (job hold active) # qstat -an parellel: Req'd Req'd Elap Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time --------------- -------- -------- ---------- ------ --- --- ------ ----- - ----- 21208.parellel vuser workq bdimer -- 1 1 -- 10000 Q -- -- # qstat -f Job Id: 21208.parellel Job_Name = bdimer Job_Owner = [EMAIL PROTECTED] job_state = Q queue = workq server = parellel Checkpoint = u ctime = Wed Mar 19 09:47:04 2008 Error_Path = parellel:/home/vuser/pdg/2007fall/1124/bdimer.err Hold_Types = n Join_Path = n Keep_Files = n Mail_Points = a mtime = Wed Mar 19 09:47:04 2008 Output_Path = parellel:/home/vuser/pdg/2007fall/1124/bdimer.out Priority = 0 qtime = Wed Mar 19 09:47:04 2008 Rerunable = True Resource_List.cput = 10000:00:00 Resource_List.ncpus = 1 Resource_List.nodect = 1 Resource_List.nodes = 1:fast:nm Resource_List.walltime = 10000:00:00 Shell_Path_List = /bin/csh Variable_List = PBS_O_HOME=/home/vuser,PBS_O_LANG=en_US.UTF-8, PBS_O_LOGNAME=vuser, PBS_O_PATH=/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bi n:/usr/X11R6/bin:/opt/env-switcher/bin:/opt/mpich-1.2.5.10-ch_p4-gcc/bi n:/opt/hdf5-oscar-1.6.0/bin/:/opt/kernel_picker/bin:/opt/pvm3/lib:/opt/ pvm3/lib/LINUX:/opt/pvm3/bin/LINUX:/opt/c3-4/:/opt/pbs/bin:/opt/pbs/lib /xpbs/bin:/home/chemtools/g03.c02/g03/bsd:/home/chemtools/g03.c02/g03/p rivate:/home/chemtools/g03.c02/g03:/usr/pgi/linux86/bin:/usr/pgi/linux8 6/lib:/usr/pgi/linux86/include:/opt/maui/bin:/root/bin, PBS_O_MAIL=/var/spool/mail/root,PBS_O_SHELL=/bin/bash, PBS_O_HOST=parellel,PBS_O_WORKDIR=/home/vuser/pdg/2007fall/1124, PBS_O_QUEUE=workq etime = Wed Mar 19 09:47:04 2008 On Wed, Mar 19, 2008 at 3:09 AM, <[EMAIL PROTECTED]> wrote: > Send Oscar-users mailing list submissions to > oscar-users@lists.sourceforge.net > > To subscribe or unsubscribe via the World Wide Web, visit > https://lists.sourceforge.net/lists/listinfo/oscar-users > or, via email, send a message with subject or body 'help' to > [EMAIL PROTECTED] > > You can reach the person managing the list at > [EMAIL PROTECTED] > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of Oscar-users digest..." > > > Today's Topics: > > 1. How to make job run? (Chunwei Han) > 2. Re: How to make job run? (Michael Edwards) > 3. Re: How to make job run? (Greenseid, Joseph M.) > 4. Re: How to make job run? (arun shankar) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Tue, 18 Mar 2008 11:10:37 +0800 > From: "Chunwei Han" <[EMAIL PROTECTED]> > Subject: [Oscar-users] How to make job run? > To: oscar-users@lists.sourceforge.net > Message-ID: > <[EMAIL PROTECTED]> > Content-Type: text/plain; charset=ISO-8859-1 > > Hey guys > > Recently I take over the job as the cluster administrator. Unluckily > the former administrator did not leave any documents and I have no > experience at all. Right now the job can not be submitted. Here is > some debugger info: > > # qstat > Job id Name User Time Use S Queue > ---------------- ---------------- ------------------ -------- - ----- > 21205.parellel bdimer vuser 0 Q workq > 21207.parellel gram0 vuser 0 Q workq > > # tracejob -n 10 21205 > > Job: 21205.parellel > > 03/17/2008 22:23:13 S Job Queued at request of [EMAIL PROTECTED], > owner = [EMAIL PROTECTED], job name = bdimer, queue = workq > 03/17/2008 22:23:13 A queue=workq > > > So, how to make it run? > The version of OSCAR is 3.0 > > > > ------------------------------ > > Message: 2 > Date: Tue, 18 Mar 2008 07:25:30 -0400 > From: "Michael Edwards" <[EMAIL PROTECTED]> > Subject: Re: [Oscar-users] How to make job run? > To: oscar-users@lists.sourceforge.net > Message-ID: > <[EMAIL PROTECTED]> > Content-Type: text/plain; charset=ISO-8859-1 > > It could be any number of things. Are other users able to run jobs on > the cluster? > When you started the cluster, did you boot the head node completely > before booting the cluster nodes? > > Other than that, it would probably be an issue with either your user > permissions (can you ssh to a node without entering a password?) or > your queue script. If you wanted to send your queue script it might > help. > > On Mon, Mar 17, 2008 at 11:10 PM, Chunwei Han <[EMAIL PROTECTED]> wrote: > > Hey guys > > > > Recently I take over the job as the cluster administrator. Unluckily > > the former administrator did not leave any documents and I have no > > experience at all. Right now the job can not be submitted. Here is > > some debugger info: > > > > # qstat > > Job id Name User Time Use S Queue > > ---------------- ---------------- ------------------ -------- - ----- > > 21205.parellel bdimer vuser 0 Q workq > > 21207.parellel gram0 vuser 0 Q workq > > > > # tracejob -n 10 21205 > > > > Job: 21205.parellel > > > > 03/17/2008 22:23:13 S Job Queued at request of [EMAIL PROTECTED], > > owner = [EMAIL PROTECTED], job name = bdimer, queue = workq > > 03/17/2008 22:23:13 A queue=workq > > > > > > So, how to make it run? > > The version of OSCAR is 3.0 > > > > ------------------------------------------------------------------------- > > This SF.net email is sponsored by: Microsoft > > Defy all challenges. Microsoft(R) Visual Studio 2008. > > http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ > > _______________________________________________ > > Oscar-users mailing list > > Oscar-users@lists.sourceforge.net > > https://lists.sourceforge.net/lists/listinfo/oscar-users > > > > > > ------------------------------ > > Message: 3 > Date: Tue, 18 Mar 2008 08:10:30 -0500 > From: "Greenseid, Joseph M." <[EMAIL PROTECTED]> > Subject: Re: [Oscar-users] How to make job run? > To: <oscar-users@lists.sourceforge.net>, > <oscar-users@lists.sourceforge.net> > Message-ID: > <[EMAIL PROTECTED]> > Content-Type: text/plain; charset="iso-8859-1" > > If you're using Maui for scheduling, try running the command `sudo checkjob > [job-id]` (or run as root if you don't have sudo set up). This gives more > info than tracejob, and may tell you something a little more helpful, like > "not enough available resources," etc. > > --Joe > > ________________________________ > > From: [EMAIL PROTECTED] on behalf of Chunwei Han > Sent: Mon 3/17/2008 11:10 PM > To: oscar-users@lists.sourceforge.net > Subject: [Oscar-users] How to make job run? > > > > Hey guys > > Recently I take over the job as the cluster administrator. Unluckily > the former administrator did not leave any documents and I have no > experience at all. Right now the job can not be submitted. Here is > some debugger info: > > # qstat > Job id Name User Time Use S Queue > ---------------- ---------------- ------------------ -------- - ----- > 21205.parellel bdimer vuser 0 Q workq > 21207.parellel gram0 vuser 0 Q workq > > # tracejob -n 10 21205 > > Job: 21205.parellel > > 03/17/2008 22:23:13 S Job Queued at request of [EMAIL PROTECTED], > owner = [EMAIL PROTECTED], job name = bdimer, queue = workq > 03/17/2008 22:23:13 A queue=workq > > > So, how to make it run? > The version of OSCAR is 3.0 > > ------------------------------------------------------------------------- > This SF.net email is sponsored by: Microsoft > Defy all challenges. Microsoft(R) Visual Studio 2008. > http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ > _______________________________________________ > Oscar-users mailing list > Oscar-users@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/oscar-users > > > > > > ------------------------------ > > Message: 4 > Date: Tue, 18 Mar 2008 07:48:13 -0700 (PDT) > From: arun shankar <[EMAIL PROTECTED]> > Subject: Re: [Oscar-users] How to make job run? > To: oscar-users@lists.sourceforge.net > Message-ID: <[EMAIL PROTECTED]> > Content-Type: text/plain; charset="us-ascii" > > Try using the command ''qstat -atn'', which will give you more debugging > information ( extra comment line ), like ''Not Enough Resources Available'' > or ''License not vaild''. Also try using the command ''qstat -f <jobid>'' to > get more info. > > If the comment says ''license not valid'', check for the license file ( i > guess it should be under <HOME directory>/server_priv/license_file if i am > not wrong ) > > Check for the Queue Settings, # qmgr ( Enter )...Check for ''s q workq > started=true and s q workq enabled=true'', if neither of this is false, make > it true. > > Above are some steps you can perform if jobs go into queue state. Hope this > will be useful. > > Regards > Arun > > > > ----- Original Message ---- > From: "Greenseid, Joseph M." <[EMAIL PROTECTED]> > To: oscar-users@lists.sourceforge.net; oscar-users@lists.sourceforge.net > Sent: Tuesday, March 18, 2008 9:10:30 PM > Subject: Re: [Oscar-users] How to make job run? > > If you're using Maui for scheduling, try running the command `sudo checkjob > [job-id]` (or run as root if you don't have sudo set up). This gives more > info than tracejob, and may tell you something a little more helpful, like > "not enough available resources," etc. > > --Joe > > ________________________________ > > From: [EMAIL PROTECTED] on behalf of Chunwei Han > Sent: Mon 3/17/2008 11:10 PM > To: oscar-users@lists.sourceforge.net > Subject: [Oscar-users] How to make job run? > > > > Hey guys > > Recently I take over the job as the cluster administrator. Unluckily > the former administrator did not leave any documents and I have no > experience at all. Right now the job can not be submitted. Here is > some debugger info: > > # qstat > Job id Name User Time Use S Queue > ---------------- ---------------- ------------------ -------- - ----- > 21205.parellel bdimer vuser 0 Q workq > 21207.parellel gram0 vuser 0 Q workq > > # tracejob -n 10 21205 > > Job: 21205.parellel > > 03/17/2008 22:23:13 S Job Queued at request of [EMAIL PROTECTED], > owner = [EMAIL PROTECTED], job name = bdimer, queue = workq > 03/17/2008 22:23:13 A queue=workq > > > So, how to make it run? > The version of OSCAR is 3.0 > > ------------------------------------------------------------------------- > This SF.net email is sponsored by: Microsoft > Defy all challenges. Microsoft(R) Visual Studio 2008. > http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ > _______________________________________________ > Oscar-users mailing list > Oscar-users@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/oscar-users > > > > ------------------------------------------------------------------------- > This SF.net email is sponsored by: Microsoft > Defy all challenges. Microsoft(R) Visual Studio 2008. > http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ > _______________________________________________ > Oscar-users mailing list > Oscar-users@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/oscar-users > > > > ____________________________________________________________________________________ > Be a better friend, newshound, and > know-it-all with Yahoo! Mobile. Try it now. > http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ > -------------- next part -------------- > An HTML attachment was scrubbed... > > ------------------------------ > > ------------------------------------------------------------------------- > This SF.net email is sponsored by: Microsoft > Defy all challenges. Microsoft(R) Visual Studio 2008. > http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ > > ------------------------------ > > _______________________________________________ > Oscar-users mailing list > Oscar-users@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/oscar-users > > > End of Oscar-users Digest, Vol 22, Issue 7 > ****************************************** > ------------------------------------------------------------------------- This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ _______________________________________________ Oscar-users mailing list Oscar-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/oscar-users