On Wed, Nov 30, 2016 at 07:14:48PM +0530, Himanshu Joshi wrote:
>    On Wed, Nov 30, 2016 at 7:03 PM, William Hay <w....@ucl.ac.uk> wrote:
> 
>      On Wed, Nov 30, 2016 at 04:50:02PM +0530, Himanshu Joshi wrote:
>      >    On Wed, Nov 30, 2016 at 4:04 PM, William Hay <w....@ucl.ac.uk>
>      wrote:
>      >
>      >      On Tue, Nov 29, 2016 at 10:35:35PM +0530, Himanshu Joshi wrote:
>      >      >    On Tue, Nov 29, 2016 at 8:57 PM, William Hay
>      <w....@ucl.ac.uk>
>      >      wrote:
>      >      >
>      >      >      On Tue, Nov 29, 2016 at 05:43:47PM +0530, Himanshu Joshi
>      wrote:
>      >      >      >    On Tue, Nov 29, 2016 at 5:30 PM, William Hay
>      >      <w....@ucl.ac.uk>
>      >      >      wrote:
>      >      >      >
>      >      >      >      On Tue, Nov 29, 2016 at 03:52:05PM +0530, Himanshu
>      Joshi
>      >      wrote:
>      >      >      >      >    On Mon, Nov 28, 2016 at 9:26 PM, William Hay
>      >      >      <w....@ucl.ac.uk>
>      >      >      >      wrote:
>      >      >      >      >
>      >      >      >      >      On Mon, Nov 28, 2016 at 06:16:00PM +0530,
>      Himanshu
>      >      Joshi
>      >      >      wrote:
>      >      >      >      >      >
>      >      >      >      >      >    Now installation of sge is done
>      >      >      >      >      >
>      >      >      >      >      >    ps aux | grep "sge" command says
>      >      >      >      >      >
>      >      >      >      >      >    root      7407  0.0  0.2 213524 38396 ?
>      >      Sl
>      >      >       16:37
>      >      >      >       0:01
>      >      >      >      >      >    /opt/sge/bin/lx-amd64/sge_qmaster
>      >      >      >      >      >    root      9962  0.0  0.0 112648   960
>      pts/0
>      >      S+
>      >      >       17:53
>      >      >      >       0:00
>      >      >      >      >      grep
>      >      >      >      >      >    --color=auto sge
>      >      >      >      >      >    then
>      >      >      >      >      >    I did
>      >      >      >      >      >     service sgeexecd.mbialjpj55 start
>      >      >      >      >      >       Starting Grid Engine execution
>      daemon
>      >      >      >      >      >
>      >      >      >      >      >    but
>      >      >      >      >      >    ps aux | grep "sge" again says the same
>      status
>      >      >      >      >      >
>      >      >      >      >      >    root      7407  0.0  0.2 213524 38396 ?
>      >      Sl
>      >      >       16:37
>      >      >      >       0:01
>      >      >      >      >      >    /opt/sge/bin/lx-amd64/sge_qmaster
>      >      >      >      >      >    root      9974  0.0  0.0 112648   960
>      pts/0
>      >      S+
>      >      >       17:54
>      >      >      >       0:00
>      >      >      >      >      grep
>      >      >      >      >      >    --color=auto sge
>      >      >      >      >      >    I would now setup SGE
>      >      >      >      >
>      >      >      >      >      sge_execd should be running.
>      >      >      >      >
>      >      >      >      >      As root try /bin/sh -x
>      >      /etc/init.d/sgeexecd.mbilajpj55
>      >      >      start
>      >      >      >      >
>      >      >      >      >
>      >      >      >      >    [root@mbialjpj ~]# /bin/sh -x
>      >      >      /etc/init.d/sgeexecd.mbilajpj55 start
>      >      >      >      >    /bin/sh: /etc/init.d/sgeexecd.mbilajpj55: No
>      such
>      >      file or
>      >      >      directory
>      >      >      >
>      >      >      >      Try again but with the obvious typo corrected.
>      >      >      >
>      >      >      >    sorry for the typo error
>      >      >      >     Here is the output...
>      >      >      I think the original typo is mine.
>      >      >      Looks like everything should work.
>      >      >
>      >      >      can you try:
>      >      >      cat /opt/sge/default/spool/mbialjpj/messages
>      >      >
>      >      >    It says
>      >      >
>      >      >
>      >      >      cat: /opt/sge/default/spool/mbialjpj/messages: No such
>      file or
>      >      directory
>      >      >
>      >      >      The log file may contain clues as to why it died/failed to
>      start.
>      >      >
>      >      >    I think you are looking for
>      /opt/sge/default/spool/qmaster/messages
>      >      >    please find it attached
>      >      >    Regards
>      >      No I was hoping for the execd messages file.  I think it should
>      be in
>      >      the location I specified.
>      >      Can you have a look under /opt/sge/default/spool for other
>      directories
>      >      and see if they have a
>      >      messages file somewhere under them somewhere?
>      >
>      >    No I have rechecked .. and there is nothing in
>      /opt/sge/default/spool
>      >    folder by name "messages" . And surprisingly this folder has only
>      one
>      >    folder and that is "qmaster"
>      >    And there is nothing by the name "messages" even in the main
>      directory
>      >    "/opt"  i.e in even in opt folder the desired file is not found.
>      >    Regards
>      >    Himanshu
> 
>      In that case I would try running the sge_execd binary directly.
> 
>      With the sge environment loaded set the SGE_ND environment variable to
>      true (which will keep the daemon in the foreground)
> 
>    How to set that?
It looks like we have the proximate cause of the problem without it so
you probably won't need to however for the sake of completeness:

IIRC root on your machine uses the csh for some reason in which case
set SGE_ND=true should do it.
If on the other hand you are using bash then
export SGE_ND=true
If you are using some other bourne like shell then
SGE_ND=true
export SGE_ND

should suffice to set it.


>    because currently
>    /opt/sge/bin/lx-amd64/sge_execd says
>    error: communication error for "mbialjpj/execd/1" running on port 1024:
>    "can't bind socket"
>    error: commlib error: can't bind socket (no additional information
>    available)
>    ..........................
>    critical error: abort qmaster registration due to communication errors
>    daemonize error: child exited before sending daemonize state

Did you run the command as root?  If so then that sounds like something else 
is using the port the sge_execd is trying to use.  Also 1024 isn't the default 
port for sge_execd.  Did you deliberately set it to something unusual when 
running inst_sge?

If you have fuser installed then something like fuser -v 1024/tcp should give 
you 
the name of the process that is listening there.  

Otherwise netstat -ltnp or ss -ltmp will list all listening processes and you 
can 
check the output for whatever is listening on port 1024


William

Attachment: signature.asc
Description: Digital signature

_______________________________________________
SGE-discuss mailing list
SGE-discuss@liv.ac.uk
https://arc.liv.ac.uk/mailman/listinfo/sge-discuss

Reply via email to