On Wed, Nov 30, 2016 at 07:14:48PM +0530, Himanshu Joshi wrote: > On Wed, Nov 30, 2016 at 7:03 PM, William Hay <w....@ucl.ac.uk> wrote: > > On Wed, Nov 30, 2016 at 04:50:02PM +0530, Himanshu Joshi wrote: > > On Wed, Nov 30, 2016 at 4:04 PM, William Hay <w....@ucl.ac.uk> > wrote: > > > > On Tue, Nov 29, 2016 at 10:35:35PM +0530, Himanshu Joshi wrote: > > > On Tue, Nov 29, 2016 at 8:57 PM, William Hay > <w....@ucl.ac.uk> > > wrote: > > > > > > On Tue, Nov 29, 2016 at 05:43:47PM +0530, Himanshu Joshi > wrote: > > > > On Tue, Nov 29, 2016 at 5:30 PM, William Hay > > <w....@ucl.ac.uk> > > > wrote: > > > > > > > > On Tue, Nov 29, 2016 at 03:52:05PM +0530, Himanshu > Joshi > > wrote: > > > > > On Mon, Nov 28, 2016 at 9:26 PM, William Hay > > > <w....@ucl.ac.uk> > > > > wrote: > > > > > > > > > > On Mon, Nov 28, 2016 at 06:16:00PM +0530, > Himanshu > > Joshi > > > wrote: > > > > > > > > > > > > Now installation of sge is done > > > > > > > > > > > > ps aux | grep "sge" command says > > > > > > > > > > > > root 7407 0.0 0.2 213524 38396 ? > > Sl > > > 16:37 > > > > 0:01 > > > > > > /opt/sge/bin/lx-amd64/sge_qmaster > > > > > > root 9962 0.0 0.0 112648 960 > pts/0 > > S+ > > > 17:53 > > > > 0:00 > > > > > grep > > > > > > --color=auto sge > > > > > > then > > > > > > I did > > > > > > service sgeexecd.mbialjpj55 start > > > > > > Starting Grid Engine execution > daemon > > > > > > > > > > > > but > > > > > > ps aux | grep "sge" again says the same > status > > > > > > > > > > > > root 7407 0.0 0.2 213524 38396 ? > > Sl > > > 16:37 > > > > 0:01 > > > > > > /opt/sge/bin/lx-amd64/sge_qmaster > > > > > > root 9974 0.0 0.0 112648 960 > pts/0 > > S+ > > > 17:54 > > > > 0:00 > > > > > grep > > > > > > --color=auto sge > > > > > > I would now setup SGE > > > > > > > > > > sge_execd should be running. > > > > > > > > > > As root try /bin/sh -x > > /etc/init.d/sgeexecd.mbilajpj55 > > > start > > > > > > > > > > > > > > > [root@mbialjpj ~]# /bin/sh -x > > > /etc/init.d/sgeexecd.mbilajpj55 start > > > > > /bin/sh: /etc/init.d/sgeexecd.mbilajpj55: No > such > > file or > > > directory > > > > > > > > Try again but with the obvious typo corrected. > > > > > > > > sorry for the typo error > > > > Here is the output... > > > I think the original typo is mine. > > > Looks like everything should work. > > > > > > can you try: > > > cat /opt/sge/default/spool/mbialjpj/messages > > > > > > It says > > > > > > > > > cat: /opt/sge/default/spool/mbialjpj/messages: No such > file or > > directory > > > > > > The log file may contain clues as to why it died/failed to > start. > > > > > > I think you are looking for > /opt/sge/default/spool/qmaster/messages > > > please find it attached > > > Regards > > No I was hoping for the execd messages file. I think it should > be in > > the location I specified. > > Can you have a look under /opt/sge/default/spool for other > directories > > and see if they have a > > messages file somewhere under them somewhere? > > > > No I have rechecked .. and there is nothing in > /opt/sge/default/spool > > folder by name "messages" . And surprisingly this folder has only > one > > folder and that is "qmaster" > > And there is nothing by the name "messages" even in the main > directory > > "/opt" i.e in even in opt folder the desired file is not found. > > Regards > > Himanshu > > In that case I would try running the sge_execd binary directly. > > With the sge environment loaded set the SGE_ND environment variable to > true (which will keep the daemon in the foreground) > > How to set that? It looks like we have the proximate cause of the problem without it so you probably won't need to however for the sake of completeness:
IIRC root on your machine uses the csh for some reason in which case set SGE_ND=true should do it. If on the other hand you are using bash then export SGE_ND=true If you are using some other bourne like shell then SGE_ND=true export SGE_ND should suffice to set it. > because currently > /opt/sge/bin/lx-amd64/sge_execd says > error: communication error for "mbialjpj/execd/1" running on port 1024: > "can't bind socket" > error: commlib error: can't bind socket (no additional information > available) > .......................... > critical error: abort qmaster registration due to communication errors > daemonize error: child exited before sending daemonize state Did you run the command as root? If so then that sounds like something else is using the port the sge_execd is trying to use. Also 1024 isn't the default port for sge_execd. Did you deliberately set it to something unusual when running inst_sge? If you have fuser installed then something like fuser -v 1024/tcp should give you the name of the process that is listening there. Otherwise netstat -ltnp or ss -ltmp will list all listening processes and you can check the output for whatever is listening on port 1024 William
signature.asc
Description: Digital signature
_______________________________________________ SGE-discuss mailing list SGE-discuss@liv.ac.uk https://arc.liv.ac.uk/mailman/listinfo/sge-discuss