On Wed, Nov 21, 2012 at 05:01:19PM +0000, Reuti wrote:
> Am 21.11.2012 um 15:43 schrieb Simon Hood:
> 
> > Hi All,
> > 
> > I need to add a network to my Gridengine master host, which already has 
> > three networks.
> > But when I stop sge_master, add the network and try to start, it won't 
> > restart.  
> 
> Any error message - any file in /tmp showing the error? Which one do you get?

Hi Reuti, Dave,

thanks for the questions/replies.  /tmp/sge_messages and 
SGE_ROOT/default/spool/qmaster/messages
contained only that appended below --- nothing which gives me a clue.  

But having left the problem for at 16:00, gone out for several beers and 
returned now, 22:00,
the solution suddenly shouts at me.  Firewall!!!  The new network is on eth0 
which previously
had a different network on it.  The new network IP was blocked at the "lo" 
interface.  Apparently
SGE likes traffic to/from all IPs on a host to be enabled through "lo".  

Forgive my total paranoia re the firewall.  But many years experience suggests 
allowing 
exactly what is required and no more or less is a good way to catch our naughty 
postgrads
out at attempted mischief.

All working now.

Simon


================================

 -- /tmp/sge_messages contained only, for example

    11/21/2012 21:45:51|  main|login|E|communication error for 
"login.redqueen.rcs.manchester.ac.uk/qmaster/1" running on port 6444: "can't 
bind socket"
    11/21/2012 21:45:52|  main|login|E|commlib error: can't bind socket (no 
additional information available)
    11/21/2012 21:46:20|  main|login|C|abort qmaster startup due to 
communication errors

 -- SGE_ROOT/default/spool/qmaster/messages contained only
  
    11/21/2012 22:02:31|  main|login|I|starting up GE 6.2 (lx24-amd64)
    11/21/2012 22:03:12|  main|login|E|jvm thread is not running
    11/21/2012 22:03:18|  main|login|I|controlled shutdown 6.2
    11/21/2012 22:03:45|  main|login|I|read job database with 18 entries in 0 
seconds
    11/21/2012 22:03:45|  main|login|I|qmaster hard descriptor limit is set to 
8192
    11/21/2012 22:03:45|  main|login|I|qmaster soft descriptor limit is set to 
8192
    11/21/2012 22:03:45|  main|login|I|qmaster will use max. 8172 file 
descriptors for communication
    11/21/2012 22:03:45|  main|login|I|qmaster will accept max. 99 dynamic 
event clients

================================



 
> -- Reuti
> 
> 
> > In /etc/hosts:
> > 
> > 127.0.0.1       localhost.localdomain localhost
> > #
> > ::1             localhost6.localdomain6 localhost6
> > #
> > 10.99.203.190   test.manchester.ac.uk  test
> > #
> > 10.2.49.100     login-stg.test.manchester.ac.uk  login-stg
> > #
> > 10.2.2.250      login.test.manchester.ac.uk login
> > 10.3.3.250      login-3.test.manchester.ac.uk login-3
> > 
> > 
> > In host_aliases:
> > 
> > login.test.manchester.ac.uk login login-3.test.manchester.ac.uk login-3 
> > login-stg.test.manchester.ac.uk login-stg test.manchester.ac.uk test
> > 
> > 
> > Some tests:
> > 
> > hostname 
> > login.test.manchester.ac.uk
> > 
> > hostname -f
> > login.test.manchester.ac.uk
> > 
> > 
> > root@test>  ./gethostbyname -aname login-3
> > login.test.manchester.ac.uk
> > root@test>  ./gethostbyname -aname login-3.test.manchester.ac.uk
> > login.test.manchester.ac.uk
> > root@test>  ./gethostbyname -aname login.test.manchester.ac.uk
> > login.test.manchester.ac.uk
> > root@test>  ./gethostbyname -aname login
> > login.test.manchester.ac.uk
> > root@test>  ./gethostbyname -aname test
> > login.test.manchester.ac.uk
> > root@test>  ./gethostbyname -aname test.manchester.ac.uk
> > login.test.manchester.ac.uk
> > root@test>  ./gethostbyname -aname login-stg.test.manchester.ac.uk
> > login.test.manchester.ac.uk
> > root@test>  ./gethostbyname -aname login-stg
> > login.test.manchester.ac.uk
> > 
> > 
> > root@test>  ./gethostbyaddr -aname 10.99.203.190
> > login.test.manchester.ac.uk
> > root@test>  ./gethostbyaddr -aname 10.2.2.250
> > login.test.manchester.ac.uk
> > root@test>  ./gethostbyaddr -aname 10.3.3.250
> > login.test.manchester.ac.uk
> > root@test>  ./gethostbyaddr -aname 10.2.49.100
> > login.test.manchester.ac.uk
> > 
> > 
> > What am I missing?
> > 
> > Cheers
> > 
> > Simon
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > -- 
> > 
> > 
> > _______________________________________________
> > users mailing list
> > [email protected]
> > https://gridengine.org/mailman/listinfo/users
> 
> 

-- 


Attachment: pgp9Ba5VhsQxu.pgp
Description: PGP signature

_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Reply via email to