Hi Jeremy:

Can you post your /var/spool/pbs/mom_priv/config?

Thanks!

Bernard 

> -----Original Message-----
> From: Jeremy Hansen [mailto:[EMAIL PROTECTED] 
> Sent: Friday, April 29, 2005 15:51
> To: Bernard Li
> Cc: [email protected]
> Subject: RE: [Oscar-users] Not enough free nodes. Tests incomplete.
> 
> 
> I fixed the problem it seems.
> 
> /var/spool/pbs/mom_priv/config had the external interface on 
> the master node defined and not the internal private address. 
>  It should have just used pbs_oscar for the hostname but 
> instead used the fqdn of the public interface.  Perhaps a bug?
> 
> Changed this, restarted pbs_mom on all nodes and now I get 
> what I expect.
> 
> Thanks
> -jeremy
> 
> On Fri, 29 Apr 2005, Bernard Li wrote:
> 
> > Well notice that according to Torque/PBS - the state of 
> your nodes are 
> > 'unknown/down'.  You need to figure out why that's the case 
> and how to 
> > make them become 'available' again.
> > 
> > Cheers,
> > 
> > Bernard
> > 
> > > -----Original Message-----
> > > From: Jeremy Hansen [mailto:[EMAIL PROTECTED]
> > > Sent: Friday, April 29, 2005 14:55
> > > To: Bernard Li
> > > Cc: [email protected]
> > > Subject: RE: [Oscar-users] Not enough free nodes. Tests 
> incomplete.
> > > 
> > > 
> > > I also noticed this using maui's checkjob
> > > 
> > > [EMAIL PROTECTED] bin]# ./checkjob 31
> > > 
> > > 
> > > checking job 31
> > > 
> > > State: Idle  (User: oscartst  Group: oscartst)
> > > WallTime: 0:00:00 of   INFINITY
> > > SubmitTime: Fri Apr 29 14:52:46
> > >   (Time Queued  Total: 0:00:23  Eligible: 0:00:00)
> > > 
> > > Total Tasks: 1
> > > 
> > > Req[0]  TaskCount: 1  Partition: ALL
> > > Network: [NONE]  Memory >= 0  Disk >= 0  Swap NC 0
> > > Opsys: [NONE]  Arch: [NONE]  Class: [workq 1]  Features: [NONE]
> > > 
> > > 
> > > IWD: [NONE]  Executable:  [NONE]
> > > QOS: DEFAULT  Bypass: 0  StartCount: 0
> > > PartitionMask: [ALL]
> > > Flags:       RESTARTABLE
> > > 
> > > job is deferred.  Reason:  NoResources  (exceeds 
> available partition
> > > procs)
> > > Holds:    Defer
> > > PE:  1.00  StartPriority:  1
> > > cannot select job 31 for partition DEFAULT (job hold active)
> > > 
> > > Job is deferred because there are no resources???
> > > 
> > > -jeremy
> > > 
> > > On Fri, 29 Apr 2005, Bernard Li wrote:
> > > 
> > > > Hey Jeremy:
> > > > 
> > > > What does pbsnodes -a give you?
> > > > 
> > > > Also, try running the PBS/Torque GUI's (like xpbs, xpbsmon)
> > > and see if
> > > > the nodes are set up properly...  it seems like the qmaster
> > > is set up
> > > > but none of the execution nodes are set up...
> > > > 
> > > > Cheers,
> > > > 
> > > > Bernard
> > > > 
> > > > > -----Original Message-----
> > > > > From: Jeremy Hansen [mailto:[EMAIL PROTECTED]
> > > > > Sent: Friday, April 29, 2005 13:38
> > > > > To: Bernard Li
> > > > > Cc: [email protected]
> > > > > Subject: RE: [Oscar-users] Not enough free nodes. Tests
> > > incomplete.
> > > > > 
> > > > > On Fri, 29 Apr 2005, Bernard Li wrote:
> > > > > 
> > > > > > Hi Jeremy: 
> > > > > > 
> > > > > > > on the master node /etc/hosts looks like this:
> > > > > > > 
> > > > > > > 10.2.6.199 oscar-control.blah.com oscar-control 
> oscar_server 
> > > > > > > nfs_oscar pbs_oscar
> > > > > > > 172.21.184.192          oscar-control.blah.com 
> oscar-control
> > > > > > > 
> > > > > > > # These entries are managed by SIS, please don't 
> modify them.
> > > > > > > 10.2.6.1             node1.blah.com  node1
> > > > > > > 10.2.6.2             node2.blah.com  node2
> > > > > > > 10.2.6.3             node3.blah.com  node3
> > > > > > > 10.2.6.4             node4.blah.com  node4
> > > > > > > 10.2.6.5             node5.blah.com  node5
> > > > > > > 10.2.6.6             node6.blah.com  node6
> > > > > > > 10.2.6.7             node7.blah.com  node7
> > > > > > > 10.2.6.8             node8.blah.com  node8
> > > > > > > 10.2.6.9             node9.blah.com  node9
> > > > > > > 10.2.6.10            node10.blah.com node10
> > > > > > 
> > > > > > I don't see a 127.0.0.1...  It needs to be there.
> > > > > 
> > > > > It's in there...just skipped it on the paste.
> > > > > 
> > > > > > > The *.err files in
> > > > > > > oscartst are zero length.
> > > > > > > 
> > > > > > > Where would I find more log files?
> > > > > > 
> > > > > > Since these are PBS/Torque errors, one place to look
> > > for logs is
> > > > > > /var/spool/pbs/server_logs.
> > > > > 
> > > > > Output from the server_log during the test:
> > > > > 
> > > > > 04/29/2005
> > > > > 13:15:15;0040;PBS_Server;Svr;oscar-control;Scheduler
> > > > > sent command time
> > > > > 04/29/2005
> > > > > 13:16:15;0040;PBS_Server;Svr;oscar-control;Scheduler
> > > > > sent command time
> > > > > 04/29/2005
> > > > > 13:17:15;0040;PBS_Server;Svr;oscar-control;Scheduler
> > > > > sent command time
> > > > > 04/29/2005
> > > > > 13:18:15;0040;PBS_Server;Svr;oscar-control;Scheduler
> > > > > sent command time
> > > > > 04/29/2005
> > > > > 13:19:15;0040;PBS_Server;Svr;oscar-control;Scheduler
> > > > > sent command time
> > > > > 04/29/2005
> > > > > 13:20:15;0040;PBS_Server;Svr;oscar-control;Scheduler
> > > > > sent command time
> > > > > 04/29/2005
> > > > > 13:21:15;0040;PBS_Server;Svr;oscar-control;Scheduler
> > > > > sent command time
> > > > > 04/29/2005
> > > > > 13:22:15;0040;PBS_Server;Svr;oscar-control;Scheduler
> > > > > sent command time
> > > > > 04/29/2005
> > > > > 13:23:15;0040;PBS_Server;Svr;oscar-control;Scheduler
> > > > > sent command time
> > > > > 04/29/2005
> > > > > 13:24:15;0040;PBS_Server;Svr;oscar-control;Scheduler
> > > > > sent command time
> > > > > 
> > > > > 
> > > > > 
> > > > > 
> > > > > 04/29/2005 13:24:23;0002;PBS_Server;Svr;PBS_Server;Server
> > > > > shutdown completed
> > > > > 04/29/2005 13:24:23;0002;PBS_Server;Svr;Log;Log closed
> > > > > 04/29/2005 13:24:23;0002;PBS_Server;Svr;Log;Log opened
> > > > > 04/29/2005 13:24:23;0006;PBS_Server;Svr;PBS_Server;Server
> > > > > oscar-control started, initialization type = 1
> > > > > 04/29/2005 13:24:23;0002;PBS_Server;Svr;Act;Account file
> > > > > /var/spool/pbs/server_priv/accounting/20050429 opened
> > > > > 04/29/2005 
> > > > > 13:24:23;0040;PBS_Server;Req;setup_nodes;setup_nodes()
> > > > > 
> > > > > 04/29/2005 13:24:23;0002;PBS_Server;Svr;PBS_Server;Server
> > > Ready, pid
> > > > > =
> > > > > 4391
> > > > > 04/29/2005
> > > > > 13:24:23;0040;PBS_Server;Svr;oscar-control;Scheduler
> > > > > sent command scheduler_first
> > > > > 04/29/2005
> > > > > 13:25:23;0040;PBS_Server;Svr;oscar-control;Scheduler
> > > > > sent command time
> > > > > 04/29/2005
> > > > > 13:26:23;0040;PBS_Server;Svr;oscar-control;Scheduler
> > > > > sent command time
> > > > > 04/29/2005
> > > > > 13:27:23;0040;PBS_Server;Svr;oscar-control;Scheduler
> > > > > sent command time
> > > > > 04/29/2005
> > > > > 13:28:23;0040;PBS_Server;Svr;oscar-control;Scheduler
> > > > > sent command time
> > > > > 04/29/2005
> > > > > 13:29:23;0040;PBS_Server;Svr;oscar-control;Scheduler
> > > > > sent command time
> > > > > 
> > > > > I don't see anything usual in this log.  One thing I've
> > > noticed, and
> > > > > perhaps this is just my ignorance on how it functions at
> > > the moment,
> > > > > but my submitted jobs do not appear to get scheduled when
> > > submitted.  
> > > > > Just a simple echo script sits in the queue:
> > > > > 
> > > > > [EMAIL PROTECTED] oscartst]$ qstat
> > > > > Job id           Name             User             Time 
> > > Use S Queue
> > > > > ---------------- ---------------- ----------------
> > > -------- - -----
> > > > > 21.oscar-control   test.sh          oscartst              
> > >   0 Q workq
> > > > > 22.oscar-control   test.sh          oscartst              
> > >   0 Q workq
> > > > > 
> > > > > 04/29/2005
> > > > > 13:36:59;0040;PBS_Server;Svr;oscar-control;Scheduler
> > > > > sent command new
> > > > > 
> > > > > 
> > > > > > Cheers,
> > > > > > 
> > > > > > Bernard
> > > > > > 
> > > > > > 
> > > > > > -------------------------------------------------------
> > > > > > This SF.Net email is sponsored by: NEC IT Guy Games.
> > > > > > Get your fingers limbered up and give it your best
> > > shot. 4 great
> > > > > > events, 4 opportunities to win big! Highest score
> > > wins.NEC IT Guy
> > > > > > Games. Play to win an NEC 61 plasma display. Visit 
> > > > > > http://www.necitguy.com/?r 
> > > > > > _______________________________________________
> > > > > > Oscar-users mailing list
> > > > > > [email protected] 
> > > > > > https://lists.sourceforge.net/lists/listinfo/oscar-users
> > > > > > 
> > > > > 
> > > > > 
> > > > 
> > > 
> > > 
> > 
> > 
> > -------------------------------------------------------
> > This SF.Net email is sponsored by: NEC IT Guy Games.
> > Get your fingers limbered up and give it your best shot. 4 great 
> > events, 4 opportunities to win big! Highest score wins.NEC IT Guy 
> > Games. Play to win an NEC 61 plasma display. Visit 
> > http://www.necitguy.com/?r 
> > _______________________________________________
> > Oscar-users mailing list
> > [email protected]
> > https://lists.sourceforge.net/lists/listinfo/oscar-users
> > 
> 
> 


-------------------------------------------------------
This SF.Net email is sponsored by: NEC IT Guy Games.
Get your fingers limbered up and give it your best shot. 4 great events, 4
opportunities to win big! Highest score wins.NEC IT Guy Games. Play to
win an NEC 61 plasma display. Visit http://www.necitguy.com/?r 
_______________________________________________
Oscar-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/oscar-users

Reply via email to