This Request For Comments follows on from a couple of recent threads pertaining to small changes to the Grid Engine build-system, mostly for the packaging of Grid Engine 8.1.3 as RPMs.
This RFC notes that:
1) The automated installation of a qmaster installtion, driven
by a user-supplied config file is already possible, eg
cd $SGE_ROOT
./inst_sge -m /opt/sge/util/install_modules/inst_mycluster.conf
2) Within such an automated qmaster installtion, there already exists
the ability to remotely install execd installtions onto a list of
supplied machines where network connectivity to them is available.
3) It is already possible to setup machines operating an execd without
needing shared filesystems created by a qmaster installtion.
This RFC suggests that:
1) It should be possible to do an automated execd installation
locally, ie, on an exced seperate from a qmaster installtion,
driven by the same user-supplied config file used for a qmaster
installtion
2) It can be desirable to set up exced installtions without the shared
filesystem components that typically expose the directory
$SGE_ROOT/$SGE_CELL/common
between the admin and compute nodes in a Grid Engine.
3) It can be desirable to set up exced installtions before the ability
to do this from the qmaster installation is in place.
This RFC proposes:
1) A single, simple, one-word, "sine qua non" change to
$SGE_ROOT/util/install_modules/inst_common.sh
to facilitate automated execd installations.
2) A set of extra shell script functions, additions to both
$SGE_ROOT/util/install_modules/inst_common.sh
$SGE_ROOT/util/install_modules/inst_execd.sh
that have both informed the process so far and have seen an
automated execd installtion performed within a Grid Engine
8.1.3 deployment from RPMs.
3) Some modifications to
$SGE_ROOT/inst_sge
$SGE_ROOT/util/install_modules/inst_common.sh
$SGE_ROOT/util/install_modules/inst_execd.sh
so as to make calls to the extra shell script functions, and the
invocation of the automated execd installation itself
4) Some modifications to CheckBinaries() in
$SGE_ROOT/util/install_modules/inst_common.sh
to take account of the fact that only installing the RPMs needed
for either a qmaster or an execd sees some binaries not installed
that the CheckBinaries() function believes needs to be there
5) A possible moving/refactoring of some shell script functions,
currently defined within
$SGE_ROOT/util/install_modules/inst_qmaster.sh
that become generic (rather common to both qmaster and execd installs)
when both automated qmaster and execd installations can be performed.
This RFC recognises that:
1) the extra shell script functions could be placed into a new
seperate install module file below
$SGE_ROOT/util/install_modules/
perhaps
$SGE_ROOT/util/install_modules/inst_execd_auto.sh
so as to minimse the alterations to existing files, until such time
as any alterations are accepted, however that approach is yet to be
tried.
2) the making generic of some currently qmaster-specific shell
script functions might have been seen as a step too far and so
the functionality of those scripts is currently duplicated within
the changes.
3) the proof of concept, which the current changes demonstrate, lacks
the ability to remove the effects of an automated execd install.
4) not everyone may need this proposed functionality but, if it can
be achieved without affecting any current functionality of the Grid
Engine's installtion process, then it seems worth making a request
for comments within the community able to comment from an informed
position.
5) Nothing has been done as yet to validate the proposed changes for
use on a windows platform.
Proposal 1
Addition of a QMASTER_HOSTNAME variable to the list of variables that
a user-supplied config file may define.
The list to be altered is
KNOWN_CONFIG_FILE_ENTRIES_INSTALL
which is defined in the
CheckConfigFile()
function of
$SGE_ROOT/util/install_modules/inst_common.sh
Rationale
1) The only piece of information about a Grid Engine deployment that
cannot currently be determined automatically, by a remote machine
that is to deploy an execd, ahead of performing the qmaster
installation, is the hostname of the qmaster machine itself.
2) This one change, on its own, is enough to allow for further/future
development of approaches to automating execd installtions.
3) Defining a variable specifiying the hostname of the qmaster machine
need not override the interactive inspection of the current
hostname, which is the mechanism by which an automated qmaster
installation currently obtains that value, and indeed, specifying
one could be used as a check.
4) A failure to specifiy a variable for the hostname of the qmaster
machine will not break the existing automated qmaster installation
because of the interactive inspection to determine the current
hostname.
5) The shell function
CheckConfigFile()
supplied in
$SGE_ROOT/util/install_modules/inst_common.sh
define a list of variables as
KNOWN_CONFIG_FILE_ENTRIES_INSTALL
and tests every config file variable that the user supplies, exiting
if the user-supplied config file defines a variable not in that list.
6) Changing the operation of CheckConfigFile() so that it can ignore
variables unknown to it, or that are speficied as to be ignored,
would be harder to implement and/or maintain, than simply adding
a variable to an existing list.
7) The name suggested is QMASTER_HOSTNAME, because an existing
variable, QMASTER_HOST, is already used within the various
install_modules howeverm it is obtained from the file
$SGE_ROOT/$SGE_CELL/common/act_qmaster
a file that an automated execd installation needs to populate
for itself, as opposed to relying on a qmaster install to have
provided it.
Proposal 2
As many people as possible take a look at the patches to
$SGE_ROOT/inst_sge
$SGE_ROOT/util/install_modules/inst_common.sh
$SGE_ROOT/util/install_modules/inst_execd.sh
that have allowed me to do the following and see what needs cleaning
up, and what might need adding to allow for use cases beyond mine,
to make them viable enough to ... errmmm ... be viable.
1) Install the 8.1.3 RPMs needed for an exced
yum localinstall \
gridengine-8.1.3-1.el6.x86_64.rpm \
gridengine-execd-8.1.3-1.el6.x86_64.rpm
2) Put a local installtion config file into the SGE_ROOT
cp inst_vuwscifachpc64.conf /opt/sge/
3) Path the three install files
patch -Np1 -i /usr/local/src/GE-8.1.3-kmb/inst_sge-8.1.3.diff
patch -Np1 -i /usr/local/src/GE-8.1.3-kmb/inst_common.sh-8.1.3.diff
patch -Np1 -i /usr/local/src/GE-8.1.3-kmb/inst_execd.sh-8.1.3.diff
4) Automatically install an execd environment
./inst_sge -x -auto /opt/sge/inst_vuwscifachpc64.conf
5) Follow the same procedure to automatically install the qmaster
from the same installtion config file on another box
6) Setup any networking and firewalling required
7) On the qmaster, add the pre-built execd node into @allhosts
and make it a submit node
qconf -mhgrp @allhosts
qconf -as cent64-02.local
8) Start up the execd and see it join the cluster.
Rationale
How will anyone answer the Request For Comments if you don't
look at the patches !
NOTES
It seems like a good idea to provide some notes as to how I went
about this, so here they are.
1) Any new shell script function names to end in Execd
This applies to modified versions of shell script functions called
during a qmater install and any completely new functions.
This was thought to be a better way make explict what appears to
be needed than just modifying existing functions.
2) Identification of the path through the qmaster install functions
suggested a new set of shell script functions for the execd:
MakeDirsMaster MakeDirsExecd
SetSpoolingOptions Not Needed
AddBootstrap AddBootstrapExecd
PrintBootstrap PrintBootstrapExecd
InitSpoolingDatabase Not Needed
AddConfiguration Not Needed
AddLocalConfiguration Not Needed
AddActQmaster AddActQmasterExecd
ProcessSGEClusterName qmaster ProcessSGEClusterNameExecd execd
AddDefaultComplexes Not Needed
AddPEFiles Not Needed
AddDefaultUsersets Not Needed
AddCommonFiles AddCommonFilesExecd
AddJMXFiles Not Needed
CreateSGEStartUpScripts $euid true master
CreateSGEStartUpScripts $euid true execd
CreateSettingsFile CreateSettingsFileExecd
3) Where just a duplication of the functionality of qmaster install
stuff was needed the relevant code is surrounded by a pair of
comment lines starting with Dupl, eg
(or in some cases Dupl mis-spelt as Dpul)
# Dupl GetQmasterSpoolDir()
QMDIR="$QMASTER_SPOOL_DIR"
$INFOTEXT -log "Using >%s< as QMASTER_SPOOL_DIR." "$QMDIR"
# Dpul End
4) The CheckBinariesExecd() shell script fucntion teases out a bit
more of the seperation of which binaries, that a vanilla Grid
Engine install thinks need to be seen, might not be there
depemding on which RPMs have been installed.
A past discussion on this suggested the idea of a WARNBINFILES list,
denoting binaries that, even though not there, would not prevent a
Grid Engine from operating.
The patch for that got applied to the original CheckBinaries()
function but was found not to be enough when only the two RPMs
needed for an execd to operate
gridengine-8.1.3-1.el6.x86_64.rpm
gridengine-execd-8.1.3-1.el6.x86_64.rpm
where installed.
Similarly, a machine which only has the two RPMs needed for a
qmaster to operate
gridengine-8.1.3-1.el6.x86_64.rpm
gridengine-qmaster-8.1.3-1.el6.x86_64.rpm
will find that it lacks some binaries that are provided in the execd.
5) The new PreInstallCheckExecd() function exists to allow for
checking that using the CheckBinariesExecd() would work for a
qmaster install and so could just replace CheckBinaries() if the
distinction between which binaries are packaged into which RPMs
is felt to be needed to be accounted for in the install process.
There are three patches attached that demostrate what I changed
and added to get to where I am - they contain a lot of "KMB" in the
comments and extra logging lines.
I don't claim this to be THE solution, but it seems to have worked for
me and, if there's felt to be a need to allow the Grid Engine to be
installed in this way, then maybe it could a start.
Kevin M. Buckley
eScience Consultant
School of Engineering and Computer Science
Victoria University of Wellington
New Zealand
inst_sge-8.1.3-kmb02.diff
Description: Binary data
inst_common.sh-8.1.3-kmb02.diff
Description: Binary data
inst_execd.sh-8.1.3-kmb02.diff
Description: Binary data
_______________________________________________ users mailing list [email protected] https://gridengine.org/mailman/listinfo/users
