Okay, can you run any qconf commands such as ‘qconf -sconf’. Try having a look at the messages files for the execution daemons. They should be in
$SGE_ROOT/default/spool/ and in there are directories for the master and exec hosts (if you have this installed in a shared filesystem envirionment). You can check both the qmaster messages file and the execd messages files in those directories. A question. Do you have the qmaster running on one host or on many? I noticed that you have the ps output for compute010 and it is running a qmaster. Other things you can check is to see if all nodes can contact the qmaster machine i.e. the networking is configured properly. You can also make sure that the host naming is correct, either configure DNS properly or configure a /etc/hosts file for all nodes so the IP to host name mapping is consistent across the cluster. Grid Engine is very picky about host names. > On May 30, 2016, at 1:36 PM, Radhouane Aniba <arad...@gmail.com> wrote: > > Hi Bill > > Yes I am sure > > This is what I have when I login to one of the nodes and do > > ubuntu@compute010:~$ ps -ef | grep sge_ > sgeadmin 1254 1 0 May28 ? 00:00:39 > /usr/lib/gridengine/sge_qmaster > sgeadmin 1446 1 0 May28 ? 00:00:22 /usr/lib/gridengine/sge_execd > ubuntu 2552 2527 0 17:36 pts/0 00:00:00 grep --color=auto sge_ > > > On Mon, May 30, 2016 at 10:33 AM, Bill Bryce <bbr...@univa.com > <mailto:bbr...@univa.com>> wrote: > Hi Rad, > > Are you sure that the execution daemons are running on your compute nodes? > Can you login to one of the nodes say ‘compute001’ and do a ps looking for > the execd? When an execd is functioning normally it provides the load and > memory, etc… none of your nodes are showing that. > > Regards, > > Bill. > >> On May 30, 2016, at 1:20 PM, Radhouane Aniba <arad...@gmail.com >> <mailto:arad...@gmail.com>> wrote: >> >> Hello all, >> >> I am trying to submit a simple "hello world" to test a gridengine (I used it >> before with no problems) >> >> The problem is that my job is waiting in the queue forever >> >> The qhost command shows a wired state of the compute nodes >> >> HOSTNAME ARCH NCPU LOAD MEMTOT MEMUSE SWAPTO >> SWAPUS >> ------------------------------------------------------------------------------- >> global - - - - - - >> - >> compute001 lx26-amd64 4 - 31.4G - 0.0 >> - >> compute002 lx26-amd64 4 - 31.4G - 0.0 >> - >> compute003 lx26-amd64 4 - 31.4G - 0.0 >> - >> compute004 lx26-amd64 4 - 31.4G - 0.0 >> - >> compute005 lx26-amd64 4 - 31.4G - 0.0 >> - >> compute006 lx26-amd64 4 - 31.4G - 0.0 >> - >> compute007 lx26-amd64 4 - 31.4G - 0.0 >> - >> compute008 lx26-amd64 4 - 31.4G - 0.0 >> - >> compute009 lx26-amd64 4 - 31.4G - 0.0 >> - >> compute010 lx26-amd64 4 - 31.4G - 0.0 >> - >> compute011 lx26-amd64 4 - 31.4G - 0.0 >> In normal times even when the compute nodes are not used I used to have some >> information on the load and memuse columns >> >> I am not an SGE persons but I am familiar with all the commands, any help >> would be much appreciated >> >> the qstat -f command shows all my nodes in au state. I've been reading a lot >> about it and I understood its an alarm state (overloaded ?) >> >> the only heavy activity I had on the head node was a script downloading 19T >> of data, could the headnode be the problem and not the compute nodes ? >> >> sge_execd is working on all the compute/exec nodes :/ >> >> -- >> Rad >> _______________________________________________ >> users mailing list >> users@gridengine.org <mailto:users@gridengine.org> >> https://gridengine.org/mailman/listinfo/users >> <https://gridengine.org/mailman/listinfo/users> > > William Bryce | VP Products > Univa Corporation, Toronto > E: bbr...@univa.com <mailto:bbr...@univa.com> | D: 647-9742841 > <tel:647-9742841> | Toll-Free (800) 370-5320 <tel:%28800%29%20370-5320> > W: Univa.com <http://univa.com/> | FB: facebook.com/univa.corporation > <http://facebook.com/univa.corporation> | T: twitter.com/Grid_Engine > <http://twitter.com/Grid_Engine> > > > > -- > Radhouane Aniba > Bioinformatics Scientist > BC Cancer Agency, Vancouver, Canada William Bryce | VP Products Univa Corporation, Toronto E: bbr...@univa.com | D: 647-9742841 | Toll-Free (800) 370-5320 W: Univa.com <http://univa.com/> | FB: facebook.com/univa.corporation <http://facebook.com/univa.corporation> | T: twitter.com/Grid_Engine <http://twitter.com/Grid_Engine>
signature.asc
Description: Message signed with OpenPGP using GPGMail
_______________________________________________ users mailing list users@gridengine.org https://gridengine.org/mailman/listinfo/users