Am 02.04.2011 um 02:21 schrieb William Deegan: > Reuti, > On Apr 1, 2011, at 5:12 PM, Reuti wrote: > >> Am 02.04.2011 um 01:41 schrieb William Deegan: >> >>> Greetings, >>> >>> Here's what I did. >>> 1) unpack ge tarballs into /opt/ge on all hosts >>> 2) configure grid master >>> 3) scp /opt/ge/default to all hosts >>> 4) verify ssh works back and forth among all hosts as root >> >> Do you need X11 forwarding? > > possibly.
ok. >> >> >>> 5) run ./start_gui_installer -debug >>> 6) Install all execution hosts >> >> There is nothing to install - well, besides adding the startup scripts to >> /etc/init.d by `chkconfig` or other means. Then add the exechosts as >> administrative hosts and start the execd on each of them. > > yes. That's just the type the hostnames into the gui and click install step.. > >> >> >>> This is shared nothing, so there are no filesystems shared among the >>> systems. >>> >>> Are there any other configurations which I need to do? >>> >>> I did this a few months ago, but I'm wondering if I missed something this >>> time around. >>> >>> qrsh, and qlogin work for some of the hosts. >>> qsh works for most of the hosts. >>> >>> I'm seeing errors like this on the qmaster host: >>> 04/01/2011 16:22:48|schedu|qmasterhost|E|unable to find job 1197 from the >>> scheduler order package >>> 04/01/2011 16:23:03|schedu|qmasterhost |E|could not find job "1197" in >>> master list >>> 04/01/2011 16:23:03|schedu|qmasterhost |E|callback function for event "48. >>> EVENT DEL JOB 1197.1" failed >>> >>> And seeing messages like this on execution hosts: >>> 04/01/2011 16:06:49| main|exehost1|W|reaping job "1190" ptf complains: Job >>> does not exist >>> 04/01/2011 16:06:49| main|exehost1|E|can't open file >>> active_jobs/1190.1/error: No such file or directory >> >> The spool files are local then on all exechosts too? Standard location where >> the SGE owner (often sgeadmin or alike) is able to write? > > Yes. /opt/ge is local to all hosts. > I ran the install as root. Should I not have? > Is the sge owner not root in this case? If not where is it specified? You can stay with it. The owner of $SGE_ROOT at time of installation will be owner of SGE, and SGE will switch lateron from root to this effective user for the daemons. When all is running as root it's also ok. Switching to a user with lower privileges is only a safety measure. Maybe SGE created local exechost configurations, which are filled with incompatible entries. $ qconf -sconfl Do you have many entries showing up there? -- Reuti _______________________________________________ users mailing list [email protected] https://gridengine.org/mailman/listinfo/users
