Title: RE: [Oscar-users] PVM jobs need to be forced with qrun to run !
I would wait for 4.2.1 - beta5 should be out real soon now (TM).
 
You might try restarting PBS and MAUI and see if the problem goes away.
 
Cheers,
 
Bernard


From: Jay Jay [mailto:[EMAIL PROTECTED]
Sent: Thu 16/02/2006 22:37
To: Bernard Li
Cc: [email protected]
Subject: RE: [Oscar-users] PVM jobs need to be forced with qrun to run !

Yeah I checked the logs. PBS keeps on crashing

.. Connection refused (111) in contact_sched, Could not contact Scheduler ..

and maui keeps on crashing & refusing to restart.
i.e.
   [EMAIL PROTECTED]  bin]# ./showstats
   ERROR:    lost connection to scheduler
   02/17 11:21:53 ERROR:    cannot request service (status)

Even if PBS is working MAUI keeps on messing it up.
/etc/init.d/maui restart wont work either.

Am thinking of upgrading to Centos 3.6 & oscar 4.2 ? would that be a good
idea ?
--
regards.


>From: "Bernard Li" <[EMAIL PROTECTED]>
>To: "X Y" <[EMAIL PROTECTED]>
>CC: <[email protected]>
>Subject: RE: [Oscar-users] PVM jobs need to be forced with qrun to run !
>Date: Thu, 16 Feb 2006 22:14:30 -0800
>
>Have you checked your TORQUE/MAUI logs for more information?
>
>It is difficult to troubleshoot any further without any additional info.
>
>Cheers,
>
>Bernard
>
>________________________________
>
>From: X Y [mailto:[EMAIL PROTECTED]]
>Sent: Thu 16/02/2006 06:33
>To: Bernard Li
>Cc: [email protected]
>Subject: RE: [Oscar-users] PVM jobs need to be forced with qrun to run !
>
>
>
>Hi Bernard,
>
>Now even mpi jobs just sit there, along with pvm jobs. both have to be qrun
>thru root.
>Whats going on ? I mean can torque/maui change behaviour on their on over
>time.
>My server is resonably secure. I highly doubt any security breach or
>something.
>
>Btw what could be quick short term solutions other than me sitting on the
>terminal qrun'ing users jobs. can qmgr thing be useful. can u suggest a
>quick fix (syntax etc..)
>---
>Regards
>
>
> >From: "Bernard Li" <[EMAIL PROTECTED]>
> >To: "X Y" <[EMAIL PROTECTED]>, <[email protected]>
> >Subject: RE: [Oscar-users] PVM jobs need to be forced with qrun to run !
> >Date: Wed, 15 Feb 2006 22:10:18 -0800
> >
> >So does the job just sit there in the queue and do not run?  Do the logs
> >(TORQUE, MAUI) say anything?
> >
> >Cheers,
> >
> >Bernard
> >
> >________________________________
> >
> >From: [EMAIL PROTECTED] on behalf of X Y
> >Sent: Wed 15/02/2006 02:43
> >To: [email protected]
> >Subject: [Oscar-users] PVM jobs need to be forced with qrun to run !
> >
> >
> >
> >
> >   Hi,
> >   My cluster specs/config:
> >          Oscar version: 4.1
> >          OS : Redhat 9 (x86)
> >          with Default Oscar installation
> >          Compute Nodes: 32 nodes
> >
> >   Im able to run my mpi jobs fine. a soon as I qsub my mpi-jobs they get
> >que-ed
> >   up in the default que (workq) & run.
> >   but my pvm jobs wont run. unless I su to root & manually (forcefully)
> >qrun
> >them. So I
> >   doubt the problem is related to resources_default.nodes being set as
>mpi
> >ones are running fine.
> >   (btw its set with the qmgr right?). the pvm pbsjobscript is attached
> >below
> >just in case.
> >   Any suggestions/ideas are welcome.
> >   Regards
> >   --
> >   SD.
> >
> >
> >
> >   pvmpbscript:
> >   [EMAIL PROTECTED] server_priv]# cat /home/oscartst/pbs_script.pvm
> >   ************************************
> >   #!/bin/sh
> >
> >   ### Job name
> >   #PBS -N pvmtest
> >
> >   ### Output files
> >   #PBS -o pvmtest.out
> >   #PBS -e pvmtest.err
> >
> >   ### Queue name
> >   #PBS -q workq
> >
> >   ### Script Commands
> >   cd $PBS_O_WORKDIR
> >
> >   # generate pvm nodes file
> >   echo "* ep=$PBS_O_WORKDIR wd=$PBS_O_WORKDIR" > pvm_nodes
> >   cat $PBS_NODEFILE >> pvm_nodes
> >
> >   # start pvm daemon & wait for slave daemons to start up
> >   pvmd pvm_nodes &
> >   #sleep 10
> >
> >   # run job
> >   p=`pwd`
> >   cp master1.c slave1.c /tmp
> >   cd /tmp
> >   gcc -I$PVM_ROOT/include master1.c -L$PVM_ROOT/lib/$PVM_ARCH -lpvm3 -o
> >   master1
> >   gcc -I$PVM_ROOT/include slave1.c -L$PVM_ROOT/lib/$PVM_ARCH -lpvm3 -o
> >slave1
> >   cp master1 slave1 $p
> >   cd $p
> >   ./master1
> >
> >   # wait again to make sure everyone's finished
> >   # then kill master pvm daemon
> >   #sleep 5
> >   /usr/bin/killall -TERM pvmd3
> >
> >   # get rid of lock files & nodes file
> >   uid=`id -u`
> >   tail +2 $PBS_NODEFILE > pvm_nodes
> >   /bin/rm -f /tmp/pvm?.$uid
> >   crm  pvm_nodes:/tmp/pvmd.$uid > /dev/null 2>&1
> >   crm  pvm_nodes:/tmp/pvml.$uid > /dev/null 2>&1
> >   /bin/rm -f pvm_nodes
> >   exit
> >   *************************************
> >
> >_________________________________________________________________
> >Express yourself instantly with MSN Messenger! Download today - it's
>FREE!
> >http://messenger.msn.click-url.com/go/onm00200471ave/direct/01/
> >
> >
> >
> >-------------------------------------------------------
> >This SF.net email is sponsored by: Splunk Inc. Do you grep through log
> >files
> >for problems?  Stop!  Download the new AJAX search engine that makes
> >searching your log files as easy as surfing the  web.  DOWNLOAD SPLUNK!
> >http://sel.as-us.falkag.net/sel?cmd=lnk&kid=103432&bid=230486&dat=121642
> >_______________________________________________
> >Oscar-users mailing list
> >[email protected]
> >https://lists.sourceforge.net/lists/listinfo/oscar-users
> >
> >
>
>_________________________________________________________________
>Don't just search. Find. Check out the new MSN Search!
>http://search.msn.click-url.com/go/onm00200636ave/direct/01/
>
>
>

_________________________________________________________________
Don’t just search. Find. Check out the new MSN Search!
http://search.msn.click-url.com/go/onm00200636ave/direct/01/

Reply via email to