On 21 Jun 2006 at 10:46, Bernard Li wrote: > Anything from the TORQUE logs?
can't find any log file by the name TORQUE, but this is from pbs_server.log again, only entries when the job is started 06/21/2006 12:19:20;0040;PBS_Server;Svr;master;Scheduler sent command time 06/21/2006 12:19:46;0008;PBS_Server;Job;19.master;Job Queued at request of [EMAIL PROTECTED], owner = [EMAIL PROTECTED], job name = fds_mpi, queue = parallel 06/21/2006 12:19:46;0040;PBS_Server;Svr;master;Scheduler sent command new 06/21/2006 12:19:47;0008;PBS_Server;Job;19.master;Job Modified at request of [EMAIL PROTECTED] 06/21/2006 12:19:47;0008;PBS_Server;Job;19.master;Job Run at request of [EMAIL PROTECTED] 06/21/2006 12:19:47;0008;PBS_Server;Job;19.master;Job Modified at request of [EMAIL PROTECTED] 06/21/2006 12:20:46;0040;PBS_Server;Svr;master;Scheduler sent command time 06/21/2006 12:21:46;0040;PBS_Server;Svr;master;Scheduler sent command time > > Cheers, > > Bernard > > > -----Original Message----- > > From: John Meskes [mailto:[EMAIL PROTECTED] > > Sent: Wednesday, June 21, 2006 10:40 > > To: Bernard Li > > Cc: [email protected] > > Subject: RE: [Oscar-devel] job status > > > > On 21 Jun 2006 at 8:55, Bernard Li wrote: > > > > > Have you looked through the TORQUE/Maui logs to check if > > anything is out > > > of the ordinary? > > > > maui.log grows at 1M per minute. I don't see any different > > entries at the supposed job > > finishing time. > > Here's what I see when the job starts... /opt/maui/maui.log > > 06/21 12:19:47 INFO: 48 feasible tasks found for job 19:0 > > in partition DEFAULT (10 > > Needed) > > 06/21 12:19:47 INFO: tasks located for job 19: 10 of 10 > > required (16 feasible) > > 06/21 12:19:47 MJobStart(19) > > 06/21 12:19:47 > > MJobDistributeTasks(19,RHEL4U2-X86.BCGSC.CA,NodeList,TaskMap) > > 06/21 12:19:47 MAMAllocJReserve(19,RIndex,ErrMsg) > > 06/21 12:19:47 MRMJobStart(19,Msg,SC) > > 06/21 12:19:47 MPBSJobStart(19,RHEL4U2-X86.BCGSC.CA,Msg,SC) > > 06/21 12:19:47 > > MPBSJobModify(19,Resource_List,Resource,atarnode25.atar+atarno > > de24.atar+atarnode23. > > atar+atarnode22.atar+atarnode21.atar+atarnode20.atar+atarnode1 > > 9.atar+atarnode18.atar+a > > tarnode17.atar+atarnode16.atar) > > 06/21 12:19:47 MPBSJobModify(19,Resource_List,Resource,10:ppn=1) > > 06/21 12:19:47 INFO: job '19' successfully started > > 06/21 12:19:47 MStatUpdateActiveJobUsage(19) > > 06/21 12:19:47 MResJCreate(19,MNodeList,00:00:00,ActiveJob,Res) > > 06/21 12:19:47 INFO: starting job '19' > > 06/21 12:19:47 INFO: 1 jobs started on iteration 288 > > Active Jobs------ > > ------------------ > > 06/21 12:19:47 INFO: resources available after > > scheduling: N: 6 P: 6 > > ...skipping > > 06/21 12:19:58 INFO: PBS node atarnode16.atar set to > > state Busy (job-exclusive) > > 06/21 12:19:58 INFO: node 'atarnode16.atar' changed > > states from Idle to Busy > > 06/21 12:19:58 ALERT: unexpected node transition on node > > 'atarnode16.atar' Idle -> Busy > > 06/21 12:19:58 > > MPBSNodeUpdate(atarnode16.atar,atarnode16.atar,Busy,RHEL4U2- > > X86.BCGSC.CA) > > 06/21 12:19:58 INFO: node atarnode16.atar has joblist > > '0/19.master' > > 06/21 12:19:58 INFO: job 19 adds 1 processors per task to > > node atarnode16.atar (1) > > 06/21 12:19:58 > > MPBSLoadQueueInfo(RHEL4U2-X86.BCGSC.CA,atarnode16.atar,SC) > > > > > > > With trunk, I believe that the dnsdomainname of client nodes are not > > > correctly set (i.e. if you run "hostname" on your client, > > it does not > > > show FQDN). > > > > I can ping by either name. > > > > > Not sure if this is related though... > > > > > > Cheers, > > > > > > Bernard > > > > > > > -----Original Message----- > > > > From: [EMAIL PROTECTED] > > > > [mailto:[EMAIL PROTECTED] On Behalf > > > > Of John Meskes > > > > Sent: Wednesday, June 21, 2006 8:51 > > > > To: [email protected] > > > > Subject: Re: [Oscar-devel] job status > > > > > > > > Using Oscar5(r5000) on CentOS4.3 > > > > I still have a problem with qstat. > > > > > > > > 1-When a job completes successfully, it is not removed > > from the queue > > > > (although the walltime does stop accumulating) > > > > If it dies due to a restriction such as memory limit > > > > exceeded, it is removed. > > > > > > > > 2-Cannot qstat from a node > > > > pbs_iff: Access from host not allowed, or unknown host > > > > No Permission. > > > > qstat: cannot connect to server pbs_oscar (errno=15007) > > > > > > > > On 6 Jun 2006 at 15:16, John Meskes wrote: > > > > > > > > > I need help with another few problems: > > > > > using CentOS, and nightly 4.2.1r4598-20060417, with > > > > upgraded lam-7.1.2 > > > > > and torque-2.0.0p8-2 > > > > > > > > > > 1-after a job finishes, it stays in the qstat listing > > > > > - I don't see an ending entry in the > > > > pbs/server_priv/accounting/ file > > > > > > > > > > 2-ganglia has gaps in the graphs. (See attached if it works) > > > > > > > > > > Is there a nightly tarball for OSCAR-5 that's close to > > > > production status? > > > > > ...John. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > _______________________________________________ > > > > Oscar-devel mailing list > > > > [email protected] > > > > https://lists.sourceforge.net/lists/listinfo/oscar-devel > > > > > > > > > > All the advantages of Linux Managed Hosting--Without the Cost and Risk! Fully trained technicians. The highest number of Red Hat certifications in the hosting industry. Fanatical Support. Click to learn more http://sel.as-us.falkag.net/sel?cmd=lnk&kid=107521&bid=248729&dat=121642 _______________________________________________ Oscar-devel mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/oscar-devel
