I ran the source command! Succesful without errors. I don't have any firewall and there aren't relevant messages in /var/log
But... /etc/init.d/sgemaster.p6444 starting sge_qmaster sge_qmaster start problem sge_qmaster didn't start! cat /tmp/sge_messages 07/08/2011 20:36:11| main|proyecto-192|C|abort qmaster startup due to communication errors :( 2011/7/8 Reuti <[email protected]> > Am 08.07.2011 um 19:32 schrieb Carlos Scaloni: > > > Thanks again crack! I have severals users. Settings.sh. What do i have to > do with this file? This is the contain of this file: > > Source it, in bash by: > > . /usr/global/sge-6.2u5-bin/default/common/settings.sh > > (note the space after the dot) or > > source /usr/global/sge-6.2u5-bin/default/common/settings.sh > > (check `man bash` for "source filename"). As I don't know where you can set > it up in your particular distrubution on a global level, you have to > investigate it by other means. > > > > cat /usr/global/sge-6.2u5-bin/default/common/settings.sh > > SGE_ROOT=/usr/global/sge-6.2u5-bin; export SGE_ROOT > > > > ARCH=`$SGE_ROOT/util/arch` > > DEFAULTMANPATH=`$SGE_ROOT/util/arch -m` > > MANTYPE=`$SGE_ROOT/util/arch -mt` > > > > SGE_CELL=default; export SGE_CELL > > SGE_CLUSTER_NAME=p6444; export SGE_CLUSTER_NAME > > SGE_QMASTER_PORT=6444; export SGE_QMASTER_PORT > > SGE_EXECD_PORT=6445; export SGE_EXECD_PORT > > .... etc. > > > > The hostname was changed by me, when i tried to fix the problems... > > > > cat /usr/global/sge-6.2u5-bin/default/common/act_qmaster > > proyecto-192.local > > Fine, so it's just running on the second interface. Did you setup any > firewall which blocks the traffic - you can also check the various logfiles > in /var/log to spotz any error message pointing to it. > > -- Reuti > > > > 2011/7/8 Reuti <[email protected]> > > Am 08.07.2011 um 16:43 schrieb Carlos Scaloni: > > > > > Firstly, "echo $SGE_ROOT" doesn't show anything... Doesn't exist this > environment variable > > > > > > But I went to: /usr/global/sge-6.2u5-bin/utilbin/lx24-amd64 > > > > > > I ran the followings commands: > > > > > > ./gethostname -all > > > critical error: Please set the environment variable SGE_ROOT. > > > > To have acccess to all SGE commands, it's necessary to source > /usr/global/sge-6.2u5-bin/default/common/settings.sh or similar, depending > on the location where you installed it. Best is something like a global > profile for all users (or just your own if you are the only user). > > > > > > > hostname > > > proyecto-192.local > > > > This could spots the problem: it's accessing the qmaster on the wrong > interface. Did you define hostname by hand, or was it chosen during > installation of the OS automatically? > > > > What's in the file: > > > > /usr/global/sge-6.2u5-bin/default/common/act_qmaster > > > > -- Reuti > > > > > > > 2011/7/8 Reuti <[email protected]> > > > Am 07.07.2011 um 19:44 schrieb Carlos Scaloni: > > > > > > > It's a virtual machine. I want to install the qmaster and the execcd > in the same machine. 192.168.56.101 is connected with the outside world! > > > > > > > > ifconfig > > > > eth0 Link encap:Ethernet HWaddr 08:00:27:F3:80:43 > > > > inet addr:10.0.2.15 Bcast:10.0.2.255 Mask:255.255.255.0 > > > > inet6 addr: fe80::a00:27ff:fef3:8043/64 Scope:Link > > > > UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 > > > > RX packets:1635 errors:0 dropped:0 overruns:0 frame:0 > > > > TX packets:1506 errors:0 dropped:0 overruns:0 carrier:0 > > > > collisions:0 txqueuelen:1000 > > > > RX bytes:1390858 (1.3 MiB) TX bytes:104832 (102.3 KiB) > > > > > > Okay, so this is the main interface and the ones which gives the name > where the qmaster can be acessed. But below it's trying to access under "| > main|proyecto-192|C|".There are some tools in $SGE_ROOT/utilbin/lx24-amd64: > > > > > > $ ./gethostname -all > > > > > > and then: > > > > > > $ ./gethostbyname <name> > > > $ ./gethostbyaddr <addr> > > > > > > by these names and address. Is there any firewall installed blocking > traffic the machine itself? This needs to be reolved, why it is running > under eth1. There is a file to map hostnames to interfaces and make an alias > to them, but I think in your case we have to look elsewhere, as you want to > run it on the main interface. A plain: > > > > > > $ hostname > > > > > > gives you the name from eth0? > > > > > > -- Reuti > > > > > > > > > > eth1 Link encap:Ethernet HWaddr 08:00:27:14:C4:0C > > > > inet addr:192.168.56.101 Bcast:192.168.56.255 > Mask:255.255.255.0 > > > > inet6 addr: fe80::a00:27ff:fe14:c40c/64 Scope:Link > > > > UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 > > > > RX packets:1805 errors:0 dropped:0 overruns:0 frame:0 > > > > TX packets:1152 errors:0 dropped:0 overruns:0 carrier:0 > > > > collisions:0 txqueuelen:1000 > > > > RX bytes:162007 (158.2 KiB) TX bytes:186807 (182.4 KiB) > > > > > > > > lo Link encap:Local Loopback > > > > inet addr:127.0.0.1 Mask:255.0.0.0 > > > > inet6 addr: ::1/128 Scope:Host > > > > UP LOOPBACK RUNNING MTU:16436 Metric:1 > > > > RX packets:92 errors:0 dropped:0 overruns:0 frame:0 > > > > TX packets:92 errors:0 dropped:0 overruns:0 carrier:0 > > > > collisions:0 txqueuelen:0 > > > > RX bytes:4600 (4.4 KiB) TX bytes:4600 (4.4 KiB) > > > > > > > > > > > > 2011/7/7 Reuti <[email protected]> > > > > Am 07.07.2011 um 19:27 schrieb Carlos Scaloni: > > > > > > > > > Hi, thanks for answering! > > > > > > > > > > I have a file in /tmp called sge_messages with this content: > > > > > > > > > > 07/07/2011 18:59:41| main|proyecto-192|C|abort qmaster startup due > to communication errors > > > > > > > > Well, you listed the complete /etc/hosts - i.e. no 127.0.0.2 is > present (i.e. no entry for it is good)? > > > > > > > > What is the primary interface in the master node? As I see two > entries I assume you have at least two network interfaces, and one of them > is connected to the outside world, the other to the nodes. Maybe it's > addressing the cluster on the wrong one. > > > > > > > > -- Reuti > > > > > > > > > > > > > 2011/7/7 Reuti <[email protected]> > > > > > Hi, > > > > > > > > > > Am 07.07.2011 um 19:11 schrieb Carlos Scaloni: > > > > > > > > > > > Hi friends! I can't install SGE, I need your help, please. Thanks > a lot in advance! > > > > > > > > > > > > Options I chose: > > > > > > > > > > > > admin user is sgeadmin > > > > > > set network ports with environment > > > > > > sge_qmaster port 6444 > > > > > > sge_execd port 6445 > > > > > > say no to pkgadd and verify permissions > > > > > > classic spooling, not berkeley db > > > > > > gid range 20000-21000 > > > > > > enter list of execution hosts node01 thru node## > > > > > > > > > > did you get any error log in /tmp? > > > > > > > > > > -- Reuti > > > > > > > > > > > > > > > > Error: > > > > > > > > > > > > Grid Engine qmaster startup > > > > > > --------------------------- > > > > > > > > > > > > Starting qmaster daemon. Please wait ... > > > > > > starting sge_qmaster > > > > > > > > > > > > sge_qmaster start problem > > > > > > > > > > > > sge_qmaster didn't start! > > > > > > sge_qmaster start problem > > > > > > > > > > > > cat /etc/hosts > > > > > > > > > > > > 127.0.0.1 localhost localhost.localdomain localhost4 > localhost4.localdomain4 > > > > > > ::1 localhost > > > > > > 192.168.56.101 proyecto-192.local > > > > > > 10.0.2.15 proyecto-10.local > > > > > > > > > > > > > > > > > > _______________________________________________ > > > > > > users mailing list > > > > > > [email protected] > > > > > > https://gridengine.org/mailman/listinfo/users > > > > > > > > > > > > > > > > > > > > _______________________________________________ > > > > > users mailing list > > > > > [email protected] > > > > > https://gridengine.org/mailman/listinfo/users > > > > > > > > > > > > > > > > > > > >
_______________________________________________ users mailing list [email protected] https://gridengine.org/mailman/listinfo/users
