From: Jay Jay [mailto:[EMAIL PROTECTED]
Sent: Thu 16/02/2006 22:37
To: Bernard Li
Cc: [email protected]
Subject: RE: [Oscar-users] PVM jobs need to be forced with qrun to run !
Yeah I checked the logs. PBS keeps on crashing
..
Connection refused (111) in contact_sched, Could not contact Scheduler
..
and maui keeps on crashing & refusing to
restart.
i.e.
[EMAIL PROTECTED] bin]#
./showstats
ERROR: lost connection to
scheduler
02/17 11:21:53 ERROR: cannot request
service (status)
Even if PBS is working MAUI keeps on messing it
up.
/etc/init.d/maui restart wont work either.
Am thinking of
upgrading to Centos 3.6 & oscar 4.2 ? would that be a good
idea
?
--
regards.
>From: "Bernard Li"
<[EMAIL PROTECTED]>
>To: "X Y" <[EMAIL PROTECTED]>
>CC:
<[email protected]>
>Subject: RE: [Oscar-users] PVM
jobs need to be forced with qrun to run !
>Date: Thu, 16 Feb 2006 22:14:30
-0800
>
>Have you checked your TORQUE/MAUI logs for more
information?
>
>It is difficult to troubleshoot any further without
any additional
info.
>
>Cheers,
>
>Bernard
>
>________________________________
>
>From:
X Y [mailto:[EMAIL PROTECTED]]
>Sent:
Thu 16/02/2006 06:33
>To: Bernard Li
>Cc:
[email protected]
>Subject: RE: [Oscar-users] PVM jobs
need to be forced with qrun to run !
>
>
>
>Hi
Bernard,
>
>Now even mpi jobs just sit there, along with pvm jobs.
both have to be qrun
>thru root.
>Whats going on ? I mean can
torque/maui change behaviour on their on over
>time.
>My server is
resonably secure. I highly doubt any security breach
or
>something.
>
>Btw what could be quick short term solutions
other than me sitting on the
>terminal qrun'ing users jobs. can qmgr thing
be useful. can u suggest a
>quick fix (syntax
etc..)
>---
>Regards
>
>
> >From: "Bernard Li"
<[EMAIL PROTECTED]>
> >To: "X Y" <[EMAIL PROTECTED]>,
<[email protected]>
> >Subject: RE: [Oscar-users]
PVM jobs need to be forced with qrun to run !
> >Date: Wed, 15 Feb 2006
22:10:18 -0800
> >
> >So does the job just sit there in the
queue and do not run? Do the logs
> >(TORQUE, MAUI) say
anything?
> >
> >Cheers,
> >
>
>Bernard
> >
> >________________________________
>
>
> >From: [EMAIL PROTECTED] on behalf of X
Y
> >Sent: Wed 15/02/2006 02:43
> >To:
[email protected]
> >Subject: [Oscar-users] PVM jobs
need to be forced with qrun to run !
> >
> >
>
>
> >
> > Hi,
> > My
cluster specs/config:
>
> Oscar version:
4.1
> > OS :
Redhat 9 (x86)
>
> with Default Oscar
installation
> >
Compute Nodes: 32 nodes
> >
> > Im able to run my
mpi jobs fine. a soon as I qsub my mpi-jobs they get
> >que-ed
>
> up in the default que (workq) & run.
>
> but my pvm jobs wont run. unless I su to root & manually
(forcefully)
> >qrun
> >them. So I
> >
doubt the problem is related to resources_default.nodes being set
as
>mpi
> >ones are running fine.
> > (btw
its set with the qmgr right?). the pvm pbsjobscript is attached
>
>below
> >just in case.
> > Any
suggestions/ideas are welcome.
> > Regards
>
> --
> > SD.
> >
>
>
> >
> > pvmpbscript:
> >
[EMAIL PROTECTED] server_priv]# cat /home/oscartst/pbs_script.pvm
>
> ************************************
> >
#!/bin/sh
> >
> > ### Job name
>
> #PBS -N pvmtest
> >
> > ###
Output files
> > #PBS -o pvmtest.out
>
> #PBS -e pvmtest.err
> >
> > ###
Queue name
> > #PBS -q workq
> >
>
> ### Script Commands
> > cd
$PBS_O_WORKDIR
> >
> > # generate pvm nodes
file
> > echo "* ep=$PBS_O_WORKDIR wd=$PBS_O_WORKDIR" >
pvm_nodes
> > cat $PBS_NODEFILE >> pvm_nodes
>
>
> > # start pvm daemon & wait for slave daemons to
start up
> > pvmd pvm_nodes &
> >
#sleep 10
> >
> > # run job
>
> p=`pwd`
> > cp master1.c slave1.c
/tmp
> > cd /tmp
> > gcc
-I$PVM_ROOT/include master1.c -L$PVM_ROOT/lib/$PVM_ARCH -lpvm3 -o
>
> master1
> > gcc -I$PVM_ROOT/include
slave1.c -L$PVM_ROOT/lib/$PVM_ARCH -lpvm3 -o
> >slave1
>
> cp master1 slave1 $p
> > cd $p
>
> ./master1
> >
> > # wait again to
make sure everyone's finished
> > # then kill master pvm
daemon
> > #sleep 5
> >
/usr/bin/killall -TERM pvmd3
> >
> > # get rid of
lock files & nodes file
> > uid=`id -u`
>
> tail +2 $PBS_NODEFILE > pvm_nodes
> >
/bin/rm -f /tmp/pvm?.$uid
> > crm
pvm_nodes:/tmp/pvmd.$uid > /dev/null 2>&1
> >
crm pvm_nodes:/tmp/pvml.$uid > /dev/null 2>&1
>
> /bin/rm -f pvm_nodes
> > exit
>
> *************************************
> >
>
>_________________________________________________________________
>
>Express yourself instantly with MSN Messenger! Download today -
it's
>FREE!
> >http://messenger.msn.click-url.com/go/onm00200471ave/direct/01/
>
>
> >
> >
>
>-------------------------------------------------------
> >This
SF.net email is sponsored by: Splunk Inc. Do you grep through log
>
>files
> >for problems? Stop! Download the new AJAX
search engine that makes
> >searching your log files as easy as surfing
the web. DOWNLOAD SPLUNK!
> >http://sel.as-us.falkag.net/sel?cmd=lnk&kid=103432&bid=230486&dat=121642
>
>_______________________________________________
> >Oscar-users
mailing list
> >[email protected]
> >https://lists.sourceforge.net/lists/listinfo/oscar-users
>
>
>
>
>
>_________________________________________________________________
>Don't
just search. Find. Check out the new MSN Search!
>http://search.msn.click-url.com/go/onm00200636ave/direct/01/
>
>
>
_________________________________________________________________
Don’t
just search. Find. Check out the new MSN Search!
http://search.msn.click-url.com/go/onm00200636ave/direct/01/
