Am 15.07.2011 um 21:29 schrieb Harry Mangalam:

> On Friday 15 July 2011 03:06:26 you wrote:
> > You use either DNS or NIS, not both. What is the order in
> > /etc/nsswitch.conf

Is it really a mixture of NIS and NISPLUS? I thought these need different 
daemons.

> The current /etc/nsswitch.conf config is:
> passwd:     nis files
> shadow:     nis files

I saw:

passwd: compat
group:  compat
shadow: compat

there up to now.

> group:      nis files
> hosts:      files nis dns

$ ypcat hosts

gives reliable results? Maybe there are false entries in the local /etc/hosts 
which you copy to all nodes. If some of the nodes are listed therein, these 
would get a name which could be different from the one delivered by NIS. Can 
you compare the files of the qmaster's /etc/hosts with the one on the node and 
the output of `ypcat hosts`.

Maybe the NIS tables weren't rebuild since the last change by `make -C /var/yp`.

-- Reuti



> bootparams: nisplus [NOTFOUND=return] files
> ethers:     files
> netmasks:   files
> networks:   files
> protocols:  files
> rpc:        files
> services:   files
> netgroup:   files nis
> publickey:  nisplus
> automount:  nis
> aliases:    files nisplus
> (This brings up a complication that the NIS/YP side of things is handled by 
> another admin who is not alway available.)
> > I ususally have all machines hard coded in /etc/hosts on the
> > headnode and read this by NIS on all nodes. No nameserver or DNS
> > involved inside the cluster.
> Well, the solution that you allude to (putting the IP/name info into the SGE 
> qmaster /etc/hosts) //does// solve the problem.  So for that problem, this is 
> now solved.
> But I still don't understand why some of them booted and some didn't before 
> that mod.
> But thanks very much for your time, suggestions and pointers.
> > -- Reuti
> > 
> > > But after restarting the qmaster, the 'bad' nodes can re-join the
> > > qmaster: qhost | grep ^n
> > > n101                    lx24-amd64      2  0.00   15.8G   89.2M  
> > >  7.5G     0.0 n102                    lx24-amd64      2  0.00  
> > > 15.8G   91.1M    7.5G     0.0 (etc - all nodes rejoined)
> > > and why should it work after a qmaster restart..?
> > > (other commands return expected results.)
> > > 
> > > > $ ./gethostbyname -all bduc-sched
> > > 
> > > Hostname: bduc-sched.nacs.uci.edu
> > > SGE name: bduc-sched.nacs.uci.edu
> > > Aliases:  bduc-sched
> > > Host Address(es): 128.200.15.19
> > > 
> > > > $ ./gethostbyname -all n103
> > > 
> > > Hostname: n103.bduc
> > > SGE name: n103.bduc
> > > Aliases:
> > > Host Address(es): 10.255.78.143
> > > 
> > > > $ ./gethostname
> > > 
> > > Hostname: bduc-sched.nacs.uci.edu
> > > Aliases:  bduc-sched
> > > Host Address(es): 128.200.15.19
> > > # following range includes both connectors and non-conectors:
> > > for NAME in n101 n102 n103 n104 n105; do ./gethostbyname -all
> > > $NAME; done Hostname: n101.bduc
> > > SGE name: n101.bduc
> > > Aliases:
> > > Host Address(es): 10.255.78.120
> > > Hostname: n102.bduc (connects OK)
> > > SGE name: n102.bduc
> > > Aliases:
> > > Host Address(es): 10.255.78.196
> > > Hostname: n103.bduc
> > > SGE name: n103.bduc
> > > Aliases:
> > > Host Address(es): 10.255.78.143
> > > Hostname: n104.bduc
> > > SGE name: n104.bduc
> > > Aliases:
> > > Host Address(es): 10.255.78.121
> > > Hostname: n105.bduc
> > > SGE name: n105.bduc
> > > Aliases:
> > > Host Address(es): 10.255.78.106
> > > 
> > > > Match all up for the particular machines? You use NIS or so or
> > > > all are recorded in local files?
> > > > 
> > > > Did you enable/disable in SGE to honor the FQDN (recorded in
> > > > $SGE_ROOT/default/common/bootstrap)?
> > > > 
> > > > -- Reuti
> > > > 
> > > > > from one of the 'connected' nodes:
> > > > > hmangala@n102:~
> > > > > 501 $  qping bduc-sched 536 qmaster 1
> > > > > 07/14/2011 23:12:16 endpoint
> > > > > bduc-sched.nacs.uci.edu/qmaster/1 at port 536 is up since
> > > > > 191495 seconds 07/14/2011 23:12:17 endpoint
> > > > > bduc-sched.nacs.uci.edu/qmaster/1 at port 536 is up since
> > > > > 191496 seconds 07/14/2011 23:12:18 endpoint
> > > > > bduc-sched.nacs.uci.edu/qmaster/1 at port 536 is up since
> > > > > 191497 seconds hmangala@n102:~
> > > > > 502 $ qping -info bduc-sched 536 qmaster 1
> > > > > 07/14/2011 23:12:59:
> > > > > SIRM version:             0.1
> > > > > SIRM message id:          1
> > > > > start time:               07/12/2011 18:00:41 (1310493641)
> > > > > run time [s]:             191538
> > > > > messages in read buffer:  0
> > > > > messages in write buffer: 0
> > > > > nr. of connected clients: 139
> > > > > status:                   1
> > > > > info:                     MAIN: E (191538.46) | signaler000:
> > > > > E (191537.17) | event_master000: E (0.52) | timer000: E
> > > > > (0.52) | worker000: E (1.09) | worker001: E (0.90) |
> > > > > listener000: E (0.90) | listener001: E (1.25) |
> > > > > scheduler000: E (1.28) | WARNING malloc:                  
> > > > > arena(15892480) |ordblks(3939)
> > > > > 
> > > > > | smblks(34) | hblksr(0) | hblhkd(0) usmblks(0) |
> > > > > | fsmblks(1232) uordblks(6483312) | fordblks(9409168) |
> > > > > | keepcost(133688)
> > > > > 
> > > > > Monitor:                  disabled
> -- 
> Harry Mangalam - Research Computing, OIT, Rm 225 MSTB, UC Irvine
> [ZOT 2225] / 92697  Google Voice Multiplexer: (949) 478-4487 
> MSTB Lat/Long: (33.642025,-117.844414) (paste into Google Maps)
> --
> Unhappy? Grouchy, pedantic old geezer available to follow you 
> relentlessly until your current life seems like paradise.


_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Reply via email to