Am 15.07.2011 um 01:14 schrieb Harry Mangalam:

> On Thursday 14 July 2011 16:07:55 you wrote:
> > Am 15.07.2011 um 00:58 schrieb Harry Mangalam:
> > > <snip>
> > > ...
> > > hmangala@n103:~
> > > 504 $ telnet bduc-sched 536
> > > Trying 10.255.78.3...
> > > Connected to bduc-sched.nacs.uci.edu (10.255.78.3).
> > > Escape character is '^]'.
> > > ...
> > 
> > Fine. Next tool is `qping`:
> Thank you - this seems to be resulting in some more info..
> > 
> > $ qping name_of_your_qmaster_machine 536 qmaster 1
> from one of the non-starters: (n103)
>  hmangala@n103:~
> 502 $  qping bduc-sched 536 qmaster 1
> endpoint bduc-sched.nacs.uci.edu/qmaster/1 at port 536: can't find connection
> access denied: client IP resolved to host name "". This is not identical to 
> clients host name ""
> endpoint bduc-sched.nacs.uci.edu/qmaster/1 at port 536: can't find connection
> > $ qping -info name_of_your_qmaster_machine 536 qmaster 1
> hmangala@n103:~
> 503 $  qping -info bduc-sched 536 qmaster 1
> endpoint bduc-sched.nacs.uci.edu/qmaster/1 at port 536: can't find connection
> access denied: client IP resolved to host name "". This is not identical to 
> clients host name ""

Then please check on the qmaster and exec machine(s) the output when youn use 
the tools in $SGE_ROOT/utilbin/lx24_amd64 like

$ ./gethostbyaddr -all 10.255.78.3
$ ./gethostbyname -all bduc-sched
$ ./gethostbyname -all n103
$ ./gethostname

Match all up for the particular machines? You use NIS or so or all are recorded 
in local files?

Did you enable/disable in SGE to honor the FQDN (recorded in 
$SGE_ROOT/default/common/bootstrap)?

-- Reuti


> from one of the 'connected' nodes:
> hmangala@n102:~
> 501 $  qping bduc-sched 536 qmaster 1                                         
>                                             
> 07/14/2011 23:12:16 endpoint bduc-sched.nacs.uci.edu/qmaster/1 at port 536 is 
> up since 191495 seconds
> 07/14/2011 23:12:17 endpoint bduc-sched.nacs.uci.edu/qmaster/1 at port 536 is 
> up since 191496 seconds
> 07/14/2011 23:12:18 endpoint bduc-sched.nacs.uci.edu/qmaster/1 at port 536 is 
> up since 191497 seconds
> hmangala@n102:~
> 502 $ qping -info bduc-sched 536 qmaster 1                                    
>                                             
> 07/14/2011 23:12:59:
> SIRM version:             0.1
> SIRM message id:          1
> start time:               07/12/2011 18:00:41 (1310493641)
> run time [s]:             191538
> messages in read buffer:  0
> messages in write buffer: 0
> nr. of connected clients: 139
> status:                   1
> info:                     MAIN: E (191538.46) | signaler000: E (191537.17) | 
> event_master000: E (0.52) | timer000: E (0.52) | worker000: E (1.09) | 
> worker001: E (0.90) | listener000: E (0.90) | listener001: E (1.25) | 
> scheduler000: E (1.28) | WARNING
> malloc:                   arena(15892480) |ordblks(3939) | smblks(34) | 
> hblksr(0) | hblhkd(0) usmblks(0) | fsmblks(1232) | uordblks(6483312) | 
> fordblks(9409168) | keepcost(133688)
> Monitor:                  disabled
> -- 
> Harry Mangalam - Research Computing, OIT, Rm 225 MSTB, UC Irvine
> [ZOT 2225] / 92697  Google Voice Multiplexer: (949) 478-4487 
> MSTB Lat/Long: (33.642025,-117.844414) (paste into Google Maps)
> --
> Unhappy? Grouchy, pedantic old geezer available to follow you 
> relentlessly until your current life seems like paradise.


_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Reply via email to