On Wed, Nov 30, 2016 at 9:37 PM, William Hay <w....@ucl.ac.uk> wrote:

> On Wed, Nov 30, 2016 at 07:14:48PM +0530, Himanshu Joshi wrote:
> >    On Wed, Nov 30, 2016 at 7:03 PM, William Hay <w....@ucl.ac.uk> wrote:
> >
> >      On Wed, Nov 30, 2016 at 04:50:02PM +0530, Himanshu Joshi wrote:
> >      >    On Wed, Nov 30, 2016 at 4:04 PM, William Hay <w....@ucl.ac.uk>
> >      wrote:
> >      >
> >      >      On Tue, Nov 29, 2016 at 10:35:35PM +0530, Himanshu Joshi
> wrote:
> >      >      >    On Tue, Nov 29, 2016 at 8:57 PM, William Hay
> >      <w....@ucl.ac.uk>
> >      >      wrote:
> >      >      >
> >      >      >      On Tue, Nov 29, 2016 at 05:43:47PM +0530, Himanshu
> Joshi
> >      wrote:
> >      >      >      >    On Tue, Nov 29, 2016 at 5:30 PM, William Hay
> >      >      <w....@ucl.ac.uk>
> >      >      >      wrote:
> >      >      >      >
> >      >      >      >      On Tue, Nov 29, 2016 at 03:52:05PM +0530,
> Himanshu
> >      Joshi
> >      >      wrote:
> >      >      >      >      >    On Mon, Nov 28, 2016 at 9:26 PM, William
> Hay
> >      >      >      <w....@ucl.ac.uk>
> >      >      >      >      wrote:
> >      >      >      >      >
> >      >      >      >      >      On Mon, Nov 28, 2016 at 06:16:00PM
> +0530,
> >      Himanshu
> >      >      Joshi
> >      >      >      wrote:
> >      >      >      >      >      >
> >      >      >      >      >      >    Now installation of sge is done
> >      >      >      >      >      >
> >      >      >      >      >      >    ps aux | grep "sge" command says
> >      >      >      >      >      >
> >      >      >      >      >      >    root      7407  0.0  0.2 213524
> 38396 ?
> >      >      Sl
> >      >      >       16:37
> >      >      >      >       0:01
> >      >      >      >      >      >    /opt/sge/bin/lx-amd64/sge_qmaster
> >      >      >      >      >      >    root      9962  0.0  0.0 112648
>  960
> >      pts/0
> >      >      S+
> >      >      >       17:53
> >      >      >      >       0:00
> >      >      >      >      >      grep
> >      >      >      >      >      >    --color=auto sge
> >      >      >      >      >      >    then
> >      >      >      >      >      >    I did
> >      >      >      >      >      >     service sgeexecd.mbialjpj55 start
> >      >      >      >      >      >       Starting Grid Engine execution
> >      daemon
> >      >      >      >      >      >
> >      >      >      >      >      >    but
> >      >      >      >      >      >    ps aux | grep "sge" again says the
> same
> >      status
> >      >      >      >      >      >
> >      >      >      >      >      >    root      7407  0.0  0.2 213524
> 38396 ?
> >      >      Sl
> >      >      >       16:37
> >      >      >      >       0:01
> >      >      >      >      >      >    /opt/sge/bin/lx-amd64/sge_qmaster
> >      >      >      >      >      >    root      9974  0.0  0.0 112648
>  960
> >      pts/0
> >      >      S+
> >      >      >       17:54
> >      >      >      >       0:00
> >      >      >      >      >      grep
> >      >      >      >      >      >    --color=auto sge
> >      >      >      >      >      >    I would now setup SGE
> >      >      >      >      >
> >      >      >      >      >      sge_execd should be running.
> >      >      >      >      >
> >      >      >      >      >      As root try /bin/sh -x
> >      >      /etc/init.d/sgeexecd.mbilajpj55
> >      >      >      start
> >      >      >      >      >
> >      >      >      >      >
> >      >      >      >      >    [root@mbialjpj ~]# /bin/sh -x
> >      >      >      /etc/init.d/sgeexecd.mbilajpj55 start
> >      >      >      >      >    /bin/sh: /etc/init.d/sgeexecd.mbilajpj55:
> No
> >      such
> >      >      file or
> >      >      >      directory
> >      >      >      >
> >      >      >      >      Try again but with the obvious typo corrected.
> >      >      >      >
> >      >      >      >    sorry for the typo error
> >      >      >      >     Here is the output...
> >      >      >      I think the original typo is mine.
> >      >      >      Looks like everything should work.
> >      >      >
> >      >      >      can you try:
> >      >      >      cat /opt/sge/default/spool/mbialjpj/messages
> >      >      >
> >      >      >    It says
> >      >      >
> >      >      >
> >      >      >      cat: /opt/sge/default/spool/mbialjpj/messages: No
> such
> >      file or
> >      >      directory
> >      >      >
> >      >      >      The log file may contain clues as to why it
> died/failed to
> >      start.
> >      >      >
> >      >      >    I think you are looking for
> >      /opt/sge/default/spool/qmaster/messages
> >      >      >    please find it attached
> >      >      >    Regards
> >      >      No I was hoping for the execd messages file.  I think it
> should
> >      be in
> >      >      the location I specified.
> >      >      Can you have a look under /opt/sge/default/spool for other
> >      directories
> >      >      and see if they have a
> >      >      messages file somewhere under them somewhere?
> >      >
> >      >    No I have rechecked .. and there is nothing in
> >      /opt/sge/default/spool
> >      >    folder by name "messages" . And surprisingly this folder has
> only
> >      one
> >      >    folder and that is "qmaster"
> >      >    And there is nothing by the name "messages" even in the main
> >      directory
> >      >    "/opt"  i.e in even in opt folder the desired file is not
> found.
> >      >    Regards
> >      >    Himanshu
> >
> >      In that case I would try running the sge_execd binary directly.
> >
> >      With the sge environment loaded set the SGE_ND environment variable
> to
> >      true (which will keep the daemon in the foreground)
> >
> >    How to set that?
> It looks like we have the proximate cause of the problem without it so
> you probably won't need to however for the sake of completeness:
>
> IIRC root on your machine uses the csh for some reason in which case
> set SGE_ND=true should do it.
> If on the other hand you are using bash then
> export SGE_ND=true
> If you are using some other bourne like shell then
> SGE_ND=true
> export SGE_ND
>
> should suffice to set it.
>
>
> >    because currently
> >    /opt/sge/bin/lx-amd64/sge_execd says
> >    error: communication error for "mbialjpj/execd/1" running on port
> 1024:
> >    "can't bind socket"
> >    error: commlib error: can't bind socket (no additional information
> >    available)
> >    ..........................
> >    critical error: abort qmaster registration due to communication errors
> >    daemonize error: child exited before sending daemonize state
>
> Did you run the command as root?

Yes, I ran the command as a root


> If so then that sounds like something else
> is using the port the sge_execd is trying to use.  Also 1024 isn't the
> default
> port for sge_execd.  Did you deliberately set it to something unusual when
> running inst_sge?
>

I have no idea about this discrepancy, At the time of running inst_sge I
have used
sge_qmaster 6444/tcp
and
sge_execd 6445/tcp

Thus, my /etc/services file reads
sge_qmaster     6444/tcp  sge-qmaster   # Grid Engine Qmaster Service
sge_qmaster     6444/udp  sge-qmaster   # Grid Engine Qmaster Service
sge_execd       6445/tcp  sge-execd     # Grid Engine Execution Service
sge_execd       6445/udp  sge-execd     # Grid Engine Execution Service

>
> If you have fuser installed then something like fuser -v 1024/tcp should
> give you
> the name of the process that is listening there.
>
this command lists
Cannot stat file /proc/13859/fd/7: Permission denied
Cannot stat file /proc/13859/fd/78: Permission denied
Cannot stat file /proc/13859/fd/79: Permission denied
Cannot stat file /proc/13859/fd/80: Permission denied
Cannot stat file /proc/13859/fd/81: Permission denied
Cannot stat file /proc/13859/fd/82: Permission denied
Cannot stat file /proc/13859/fd/83: Permission denied
Cannot stat file /proc/13859/fd/84: Permission denied
Cannot stat file /proc/13859/fd/85: Permission denied
                     USER        PID ACCESS COMMAND
1024/tcp:            root       7407 F.... sge_qmaster


>
> Otherwise netstat -ltnp or ss -ltmp will list all listening processes and
> you can
> check the output for whatever is listening on port 1024
> *The   **netstat -ltnp command also says the same*
>


> Active Internet connections (only servers)
> Proto Recv-Q Send-Q Local Address           Foreign Address
> State       PID/Program name
> tcp        0      0 0.0.0.0:33738           0.0.0.0:*
> LISTEN      1620/rpc.statd
> tcp        0      0 0.0.0.0:111             0.0.0.0:*
> LISTEN      1/systemd
> tcp        0      0 0.0.0.0:20048           0.0.0.0:*
> LISTEN      1699/rpc.mountd
> tcp        0      0 192.168.122.1:53        0.0.0.0:*
> LISTEN      3188/dnsmasq
> tcp        0      0 0.0.0.0:22              0.0.0.0:*
> LISTEN      1637/sshd
> tcp        0      0 127.0.0.1:631           0.0.0.0:*
> LISTEN      4461/cupsd
> tcp        0      0 127.0.0.1:25            0.0.0.0:*
> LISTEN      3043/master
> tcp        0      0 0.0.0.0:1024            0.0.0.0:*
> LISTEN      7407/sge_qmaster
> tcp        0      0 0.0.0.0:40513           0.0.0.0:*
> LISTEN      -
> tcp        0      0 0.0.0.0:2049            0.0.0.0:*
> LISTEN      -
> tcp6       0      0 :::111                  :::*
> LISTEN      1/systemd
> tcp6       0      0 :::20048                :::*
> LISTEN      1699/rpc.mountd
> tcp6       0      0 :::42769                :::*
> LISTEN      1620/rpc.statd
> tcp6       0      0 :::22                   :::*
> LISTEN      1637/sshd
> tcp6       0      0 :::31415                :::*
> LISTEN      15788/MATLAB
> tcp6       0      0 ::1:631                 :::*
> LISTEN      4461/cupsd
> tcp6       0      0 ::1:25                  :::*
> LISTEN      3043/master
> tcp6       0      0 :::45086                :::*
> LISTEN      -
> tcp6       0      0 :::2049                 :::*
> LISTEN      -
>
> *and ss -ltmp command says*

State      Recv-Q Send-Q              Local
Address:Port                               Peer Address:Port
LISTEN     0      128
*:33738                                         *:*
users:(("rpc.statd",pid=1620,fd=9))
     skmem:(r0,rb87380,t0,tb16384,f0,w0,o0,bl0)
LISTEN     0      128
*:sunrpc                                        *:*
users:(("rpcbind",pid=1079,fd=5),("systemd",pid=1,fd=42))
     skmem:(r0,rb87380,t0,tb16384,f0,w0,o0,bl0)
LISTEN     0      128
*:mountd                                        *:*
users:(("rpc.mountd",pid=1699,fd=8))
     skmem:(r0,rb87380,t0,tb16384,f0,w0,o0,bl0)
LISTEN     0      5                   192.168.122.1:domain
*:*                     users:(("dnsmasq",pid=3188,fd=6))
     skmem:(r0,rb87380,t0,tb16384,f0,w0,o0,bl0)
LISTEN     0      128
*:ssh                                           *:*
users:(("sshd",pid=1637,fd=3))
     skmem:(r0,rb87380,t0,tb16384,f0,w0,o0,bl0)
LISTEN     0      128                     127.0.0.1:ipp
*:*                     users:(("cupsd",pid=4461,fd=12))
     skmem:(r0,rb87380,t0,tb16384,f0,w0,o0,bl0)
LISTEN     0      100                     127.0.0.1:smtp
*:*                     users:(("master",pid=3043,fd=13))
     skmem:(r0,rb87380,t0,tb16384,f0,w0,o0,bl0)
LISTEN     0      5
*:sge_qmaster                                   *:*
users:(("sge_qmaster",pid=7407,fd=3))
     skmem:(r0,rb87380,t0,tb16384,f0,w0,o0,bl0)
LISTEN     0      64
*:40513                                         *:*
     skmem:(r0,rb65536,t0,tb65536,f0,w0,o0,bl0)
LISTEN     0      64
*:nfs                                           *:*
     skmem:(r0,rb8421376,t0,tb8421376,f0,w0,o0,bl0)
LISTEN     0      128
:::sunrpc                                       :::*
users:(("rpcbind",pid=1079,fd=4),("systemd",pid=1,fd=41))
     skmem:(r0,rb87380,t0,tb16384,f0,w0,o0,bl0)
LISTEN     0      128
:::mountd                                       :::*
users:(("rpc.mountd",pid=1699,fd=10))
     skmem:(r0,rb87380,t0,tb16384,f0,w0,o0,bl0)
LISTEN     0      128
:::42769                                        :::*
users:(("rpc.statd",pid=1620,fd=11))
     skmem:(r0,rb87380,t0,tb16384,f0,w0,o0,bl0)
LISTEN     0      128
:::ssh                                          :::*
users:(("sshd",pid=1637,fd=4))
     skmem:(r0,rb87380,t0,tb16384,f0,w0,o0,bl0)
LISTEN     0      50
:::31415                                        :::*
users:(("MATLAB",pid=15788,fd=390))
     skmem:(r0,rb87380,t0,tb16384,f0,w0,o0,bl0)
LISTEN     0      128
::1:ipp                                          :::*
users:(("cupsd",pid=4461,fd=11))
     skmem:(r0,rb87380,t0,tb16384,f0,w0,o0,bl0)
LISTEN     0      100
::1:smtp                                         :::*
users:(("master",pid=3043,fd=14))
     skmem:(r0,rb87380,t0,tb16384,f0,w0,o0,bl0)
LISTEN     0      64
:::45086                                        :::*
     skmem:(r0,rb65536,t0,tb65536,f0,w0,o0,bl0)
LISTEN     0      64
:::nfs                                          :::*
     skmem:(r0,rb8421376,t0,tb8421376,f0,w0,o0,bl0)

Did I do something wrong at the time of running inst_sge?

-- 
Himanshu Joshi
M.Tech. Cognitive & Neuroscience.
Ph.D Scholar,
Department of Psychiatry
NIMHANS, Bangalore
Publications
<https://scholar.google.co.in/citations?hl=en&user=OspDsGUAAAAJ&view_op=list_works&gmla=AJsN-F4EvpCnES94r26jSpcDQFnN_-rSpEtp0PNdwObxCjniNpjkL55yPooOzK6epx6bHLvPuwJ2LIL3Wgkvxn4xeZXy5Wh0NpiR4E_Ebq88a1jaCS4r5q14b_4jCaeeDct8aeK15Bxr>
Multimodal Brain Image Analysis Laboratory
<http://mbial.weebly.com/himanshu-joshi.html>
_______________________________________________
SGE-discuss mailing list
SGE-discuss@liv.ac.uk
https://arc.liv.ac.uk/mailman/listinfo/sge-discuss

Reply via email to