Thanks Reuti.
A note for others: The Qmaster directory is automatically created by OGE, but
the compute node directory needs to exist as Reuti says *but* the directory
also needs to be owned by OGE as well. Simply creating the dir will not work.
In my case /data/hpc/oge where I installed OGE is owned by ogeadmin, so
"/var/spool/oge" on the compute node needs to exist *and* owned by ogeadmin.
Joseph
On 06/05/2012 08:53 AM, Reuti wrote:
Am 05.06.2012 um 17:47 schrieb Joseph Farran:
My OGE software resides on a shared NFS directory /data/hpc/oge.
When I run the ./start_gui_installer script set OGE up with:
Qmaster Spool: /var/spool/oge/default/spool/qmaster
Global execd: /var/spool/oge/default/spool
There is no need to have "spool" in the pathname twice.
Qmaster Spool: /var/spool/oge/qmaster
Global execd: /var/spool/oge
should do. These directories need to exist I think. The node specigic one will
be created by OGE when the execd starts up.
Spooling: classic
The head node installs correctly, but compute nodes installation fails. The
error for the compute nodes show:
AILED: Task failed.
Is there anything in /tmp from the execd? It's the place where some diagnostic
messages will created in case it can't startup.
-- Reuti
OUTPUT:
Your $SGE_ROOT directory: /data/hpc/oge
Using cell:>default<
Creating local configuration for host>compute-1-1.local<
[email protected] added "compute-1-1.local" to configuration list
Local configuration for host>compute-1-1.local< created.
Adding submit host>compute-1-1<
compute-1-1.local added to submit host list
cp /data/hpc/oge/default/common/sgeexecd /etc/init.d/sgeexecd.HPC
/usr/lib/lsb/install_initd /etc/init.d/sgeexecd.HPC
starting sge_execd
[email protected] modified "@allhosts" in host group list
[email protected] modified "all.q" in cluster queue list
got select error: Connection refused
got select error: closing "compute-1-1.local/execd/1"
Execd on host compute-1-1.local is not started!
ERROR:
Warning: untrusted X11 forwarding setup failed: xauth key data not generated
Warning: No xauth data; using fake authentication data for X11 forwarding.
TERM environment variable not set.
If I setup OGE with
Qmaster Spool: /var/spool/oge/default/spool/qmaster
Global execd: /data/hpc/oge/default/spool
Spooling: classic
Using the NFS share directory for "Global execd", then everything works just
fine - compute nodes are setup correctly.
What am I doing wrong?
Joseph
On 06/04/2012 02:51 PM, Reuti wrote:
Hi,
Am 04.06.2012 um 22:59 schrieb Joseph Farran:
When installing OGE with respect to the Spooling Configuration, one can select:
Qmaster spool directory
Global execd spool directory
I installed OGE from the head node on a shared NFS directory ( /data/oge ) and
like to make the spooling to be on the head node /var file system while leaving
oge executables in the NFS share directory.
Would the options be to change "Qmaster spool directory" to something like
"/var/oge"
Yes, or /var/spool/oge.
and leave the "Global execd spool directory" as is which is the shared NFS
directory?
Well, this could also be /var/spool/oge, then it would be local on each node.
http://arc.liv.ac.uk/SGE/howto/nfsreduce.html
-- Reuti
_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users