Well notice that according to Torque/PBS - the state of your nodes are
'unknown/down'.  You need to figure out why that's the case and how to
make them become 'available' again.

Cheers,

Bernard 

> -----Original Message-----
> From: Jeremy Hansen [mailto:[EMAIL PROTECTED] 
> Sent: Friday, April 29, 2005 14:55
> To: Bernard Li
> Cc: [email protected]
> Subject: RE: [Oscar-users] Not enough free nodes. Tests incomplete.
> 
> 
> I also noticed this using maui's checkjob
> 
> [EMAIL PROTECTED] bin]# ./checkjob 31
> 
> 
> checking job 31
> 
> State: Idle  (User: oscartst  Group: oscartst)
> WallTime: 0:00:00 of   INFINITY
> SubmitTime: Fri Apr 29 14:52:46
>   (Time Queued  Total: 0:00:23  Eligible: 0:00:00)
> 
> Total Tasks: 1
> 
> Req[0]  TaskCount: 1  Partition: ALL
> Network: [NONE]  Memory >= 0  Disk >= 0  Swap NC 0
> Opsys: [NONE]  Arch: [NONE]  Class: [workq 1]  Features: [NONE]
> 
> 
> IWD: [NONE]  Executable:  [NONE]
> QOS: DEFAULT  Bypass: 0  StartCount: 0
> PartitionMask: [ALL]
> Flags:       RESTARTABLE
> 
> job is deferred.  Reason:  NoResources  (exceeds available partition
> procs)
> Holds:    Defer
> PE:  1.00  StartPriority:  1
> cannot select job 31 for partition DEFAULT (job hold active)
> 
> Job is deferred because there are no resources???
> 
> -jeremy
> 
> On Fri, 29 Apr 2005, Bernard Li wrote:
> 
> > Hey Jeremy:
> > 
> > What does pbsnodes -a give you?
> > 
> > Also, try running the PBS/Torque GUI's (like xpbs, xpbsmon) 
> and see if 
> > the nodes are set up properly...  it seems like the qmaster 
> is set up 
> > but none of the execution nodes are set up...
> > 
> > Cheers,
> > 
> > Bernard
> > 
> > > -----Original Message-----
> > > From: Jeremy Hansen [mailto:[EMAIL PROTECTED]
> > > Sent: Friday, April 29, 2005 13:38
> > > To: Bernard Li
> > > Cc: [email protected]
> > > Subject: RE: [Oscar-users] Not enough free nodes. Tests 
> incomplete.
> > > 
> > > On Fri, 29 Apr 2005, Bernard Li wrote:
> > > 
> > > > Hi Jeremy: 
> > > > 
> > > > > on the master node /etc/hosts looks like this:
> > > > > 
> > > > > 10.2.6.199 oscar-control.blah.com oscar-control oscar_server 
> > > > > nfs_oscar pbs_oscar
> > > > > 172.21.184.192          oscar-control.blah.com oscar-control
> > > > > 
> > > > > # These entries are managed by SIS, please don't modify them.
> > > > > 10.2.6.1             node1.blah.com  node1
> > > > > 10.2.6.2             node2.blah.com  node2
> > > > > 10.2.6.3             node3.blah.com  node3
> > > > > 10.2.6.4             node4.blah.com  node4
> > > > > 10.2.6.5             node5.blah.com  node5
> > > > > 10.2.6.6             node6.blah.com  node6
> > > > > 10.2.6.7             node7.blah.com  node7
> > > > > 10.2.6.8             node8.blah.com  node8
> > > > > 10.2.6.9             node9.blah.com  node9
> > > > > 10.2.6.10            node10.blah.com node10
> > > > 
> > > > I don't see a 127.0.0.1...  It needs to be there.
> > > 
> > > It's in there...just skipped it on the paste.
> > > 
> > > > > The *.err files in
> > > > > oscartst are zero length.
> > > > > 
> > > > > Where would I find more log files?
> > > > 
> > > > Since these are PBS/Torque errors, one place to look 
> for logs is 
> > > > /var/spool/pbs/server_logs.
> > > 
> > > Output from the server_log during the test:
> > > 
> > > 04/29/2005
> > > 13:15:15;0040;PBS_Server;Svr;oscar-control;Scheduler
> > > sent command time
> > > 04/29/2005
> > > 13:16:15;0040;PBS_Server;Svr;oscar-control;Scheduler
> > > sent command time
> > > 04/29/2005
> > > 13:17:15;0040;PBS_Server;Svr;oscar-control;Scheduler
> > > sent command time
> > > 04/29/2005
> > > 13:18:15;0040;PBS_Server;Svr;oscar-control;Scheduler
> > > sent command time
> > > 04/29/2005
> > > 13:19:15;0040;PBS_Server;Svr;oscar-control;Scheduler
> > > sent command time
> > > 04/29/2005
> > > 13:20:15;0040;PBS_Server;Svr;oscar-control;Scheduler
> > > sent command time
> > > 04/29/2005
> > > 13:21:15;0040;PBS_Server;Svr;oscar-control;Scheduler
> > > sent command time
> > > 04/29/2005
> > > 13:22:15;0040;PBS_Server;Svr;oscar-control;Scheduler
> > > sent command time
> > > 04/29/2005
> > > 13:23:15;0040;PBS_Server;Svr;oscar-control;Scheduler
> > > sent command time
> > > 04/29/2005
> > > 13:24:15;0040;PBS_Server;Svr;oscar-control;Scheduler
> > > sent command time
> > > 
> > > 
> > > 
> > > 
> > > 04/29/2005 13:24:23;0002;PBS_Server;Svr;PBS_Server;Server
> > > shutdown completed
> > > 04/29/2005 13:24:23;0002;PBS_Server;Svr;Log;Log closed
> > > 04/29/2005 13:24:23;0002;PBS_Server;Svr;Log;Log opened
> > > 04/29/2005 13:24:23;0006;PBS_Server;Svr;PBS_Server;Server
> > > oscar-control started, initialization type = 1
> > > 04/29/2005 13:24:23;0002;PBS_Server;Svr;Act;Account file
> > > /var/spool/pbs/server_priv/accounting/20050429 opened
> > > 04/29/2005 13:24:23;0040;PBS_Server;Req;setup_nodes;setup_nodes()
> > > 
> > > 04/29/2005 13:24:23;0002;PBS_Server;Svr;PBS_Server;Server 
> Ready, pid 
> > > =
> > > 4391
> > > 04/29/2005
> > > 13:24:23;0040;PBS_Server;Svr;oscar-control;Scheduler
> > > sent command scheduler_first
> > > 04/29/2005
> > > 13:25:23;0040;PBS_Server;Svr;oscar-control;Scheduler
> > > sent command time
> > > 04/29/2005
> > > 13:26:23;0040;PBS_Server;Svr;oscar-control;Scheduler
> > > sent command time
> > > 04/29/2005
> > > 13:27:23;0040;PBS_Server;Svr;oscar-control;Scheduler
> > > sent command time
> > > 04/29/2005
> > > 13:28:23;0040;PBS_Server;Svr;oscar-control;Scheduler
> > > sent command time
> > > 04/29/2005
> > > 13:29:23;0040;PBS_Server;Svr;oscar-control;Scheduler
> > > sent command time
> > > 
> > > I don't see anything usual in this log.  One thing I've 
> noticed, and 
> > > perhaps this is just my ignorance on how it functions at 
> the moment, 
> > > but my submitted jobs do not appear to get scheduled when 
> submitted.  
> > > Just a simple echo script sits in the queue:
> > > 
> > > [EMAIL PROTECTED] oscartst]$ qstat
> > > Job id           Name             User             Time 
> Use S Queue
> > > ---------------- ---------------- ---------------- 
> -------- - -----
> > > 21.oscar-control   test.sh          oscartst              
>   0 Q workq
> > > 22.oscar-control   test.sh          oscartst              
>   0 Q workq
> > > 
> > > 04/29/2005
> > > 13:36:59;0040;PBS_Server;Svr;oscar-control;Scheduler
> > > sent command new
> > > 
> > > 
> > > > Cheers,
> > > > 
> > > > Bernard
> > > > 
> > > > 
> > > > -------------------------------------------------------
> > > > This SF.Net email is sponsored by: NEC IT Guy Games.
> > > > Get your fingers limbered up and give it your best 
> shot. 4 great 
> > > > events, 4 opportunities to win big! Highest score 
> wins.NEC IT Guy 
> > > > Games. Play to win an NEC 61 plasma display. Visit 
> > > > http://www.necitguy.com/?r 
> > > > _______________________________________________
> > > > Oscar-users mailing list
> > > > [email protected]
> > > > https://lists.sourceforge.net/lists/listinfo/oscar-users
> > > > 
> > > 
> > > 
> > 
> 
> 


-------------------------------------------------------
This SF.Net email is sponsored by: NEC IT Guy Games.
Get your fingers limbered up and give it your best shot. 4 great events, 4
opportunities to win big! Highest score wins.NEC IT Guy Games. Play to
win an NEC 61 plasma display. Visit http://www.necitguy.com/?r 
_______________________________________________
Oscar-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/oscar-users

Reply via email to