On 21 Jun 2006 at 10:46, Bernard Li wrote:

> Anything from the TORQUE logs?

can't find any log file by the name TORQUE, but this is from pbs_server.log
again, only entries when the job is started
06/21/2006 12:19:20;0040;PBS_Server;Svr;master;Scheduler sent command time
06/21/2006 12:19:46;0008;PBS_Server;Job;19.master;Job Queued at request of 
[EMAIL PROTECTED], owner = [EMAIL PROTECTED], job name = fds_mpi, queue = 
parallel
06/21/2006 12:19:46;0040;PBS_Server;Svr;master;Scheduler sent command new
06/21/2006 12:19:47;0008;PBS_Server;Job;19.master;Job Modified at request of 
[EMAIL PROTECTED]
06/21/2006 12:19:47;0008;PBS_Server;Job;19.master;Job Run at request of [EMAIL 
PROTECTED]
06/21/2006 12:19:47;0008;PBS_Server;Job;19.master;Job Modified at request of 
[EMAIL PROTECTED]
06/21/2006 12:20:46;0040;PBS_Server;Svr;master;Scheduler sent command time
06/21/2006 12:21:46;0040;PBS_Server;Svr;master;Scheduler sent command time

> 
> Cheers,
> 
> Bernard 
> 
> > -----Original Message-----
> > From: John Meskes [mailto:[EMAIL PROTECTED] 
> > Sent: Wednesday, June 21, 2006 10:40
> > To: Bernard Li
> > Cc: [email protected]
> > Subject: RE: [Oscar-devel] job status
> > 
> > On 21 Jun 2006 at 8:55, Bernard Li wrote:
> > 
> > > Have you looked through the TORQUE/Maui logs to check if 
> > anything is out
> > > of the ordinary?
> > 
> > maui.log grows at 1M per minute. I don't see any different 
> > entries at the supposed job 
> > finishing time.
> > Here's what I see when the job starts... /opt/maui/maui.log
> > 06/21 12:19:47 INFO:     48 feasible tasks found for job 19:0 
> > in partition DEFAULT (10 
> > Needed)
> > 06/21 12:19:47 INFO:     tasks located for job 19:  10 of 10 
> > required (16 feasible)
> > 06/21 12:19:47 MJobStart(19)
> > 06/21 12:19:47 
> > MJobDistributeTasks(19,RHEL4U2-X86.BCGSC.CA,NodeList,TaskMap)
> > 06/21 12:19:47 MAMAllocJReserve(19,RIndex,ErrMsg)
> > 06/21 12:19:47 MRMJobStart(19,Msg,SC)
> > 06/21 12:19:47 MPBSJobStart(19,RHEL4U2-X86.BCGSC.CA,Msg,SC)
> > 06/21 12:19:47 
> > MPBSJobModify(19,Resource_List,Resource,atarnode25.atar+atarno
> > de24.atar+atarnode23.
> > atar+atarnode22.atar+atarnode21.atar+atarnode20.atar+atarnode1
> > 9.atar+atarnode18.atar+a
> > tarnode17.atar+atarnode16.atar)
> > 06/21 12:19:47 MPBSJobModify(19,Resource_List,Resource,10:ppn=1)
> > 06/21 12:19:47 INFO:     job '19' successfully started
> > 06/21 12:19:47 MStatUpdateActiveJobUsage(19)
> > 06/21 12:19:47 MResJCreate(19,MNodeList,00:00:00,ActiveJob,Res)
> > 06/21 12:19:47 INFO:     starting job '19'
> > 06/21 12:19:47 INFO:     1 jobs started on iteration 288
> > Active Jobs------
> > ------------------
> > 06/21 12:19:47 INFO:     resources available after 
> > scheduling: N: 6  P: 6
> > ...skipping
> > 06/21 12:19:58 INFO:     PBS node atarnode16.atar set to 
> > state Busy (job-exclusive)
> > 06/21 12:19:58 INFO:     node 'atarnode16.atar' changed 
> > states from Idle to Busy
> > 06/21 12:19:58 ALERT:    unexpected node transition on node 
> > 'atarnode16.atar'  Idle -> Busy
> > 06/21 12:19:58 
> > MPBSNodeUpdate(atarnode16.atar,atarnode16.atar,Busy,RHEL4U2-
> > X86.BCGSC.CA)
> > 06/21 12:19:58 INFO:     node atarnode16.atar has joblist 
> > '0/19.master'
> > 06/21 12:19:58 INFO:     job 19 adds 1 processors per task to 
> > node atarnode16.atar (1)
> > 06/21 12:19:58 
> > MPBSLoadQueueInfo(RHEL4U2-X86.BCGSC.CA,atarnode16.atar,SC)
> > 
> > 
> > > With trunk, I believe that the dnsdomainname of client nodes are not
> > > correctly set (i.e. if you run "hostname" on your client, 
> > it does not
> > > show FQDN).
> > 
> > I can ping by either name.
> > 
> > > Not sure if this is related though...
> > > 
> > > Cheers,
> > > 
> > > Bernard
> > > 
> > > > -----Original Message-----
> > > > From: [EMAIL PROTECTED] 
> > > > [mailto:[EMAIL PROTECTED] On Behalf 
> > > > Of John Meskes
> > > > Sent: Wednesday, June 21, 2006 8:51
> > > > To: [email protected]
> > > > Subject: Re: [Oscar-devel] job status
> > > > 
> > > > Using Oscar5(r5000) on CentOS4.3
> > > > I still have a problem with qstat.
> > > > 
> > > > 1-When a job completes successfully, it is not removed 
> > from the queue
> > > > (although the walltime does stop accumulating)
> > > > If it dies due to a restriction such as memory limit 
> > > > exceeded, it is removed.
> > > > 
> > > > 2-Cannot qstat from a node
> > > > pbs_iff: Access from host not allowed, or unknown host
> > > > No Permission.
> > > > qstat: cannot connect to server pbs_oscar (errno=15007)
> > > > 
> > > > On 6 Jun 2006 at 15:16, John Meskes wrote:
> > > > 
> > > > > I need help with another few problems:
> > > > > using CentOS, and nightly 4.2.1r4598-20060417, with 
> > > > upgraded lam-7.1.2 
> > > > > and torque-2.0.0p8-2
> > > > > 
> > > > > 1-after a job finishes, it stays in the qstat listing
> > > > >  - I don't see an ending entry in the 
> > > > pbs/server_priv/accounting/ file
> > > > > 
> > > > > 2-ganglia has gaps in the graphs. (See attached if it works)
> > > > > 
> > > > > Is there a nightly tarball for OSCAR-5 that's close to 
> > > > production status?
> > > > > ...John.
> > > > > 
> > > > > 
> > > > 
> > > > 
> > > > 
> > > > 
> > > > 
> > > > _______________________________________________
> > > > Oscar-devel mailing list
> > > > [email protected]
> > > > https://lists.sourceforge.net/lists/listinfo/oscar-devel
> > > > 
> > 
> > 
> > 




All the advantages of Linux Managed Hosting--Without the Cost and Risk!
Fully trained technicians. The highest number of Red Hat certifications in
the hosting industry. Fanatical Support. Click to learn more
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=107521&bid=248729&dat=121642
_______________________________________________
Oscar-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/oscar-devel

Reply via email to