This Request For Comments follows on from a couple of recent threads
pertaining to small changes to the Grid Engine build-system, mostly
for the packaging of Grid Engine 8.1.3 as RPMs.

This RFC notes that:

1) The automated installation of a qmaster installtion, driven
   by a user-supplied config file is already possible, eg

   cd $SGE_ROOT
   ./inst_sge -m /opt/sge/util/install_modules/inst_mycluster.conf

2) Within such an automated qmaster installtion, there already exists
  the ability to remotely install execd installtions onto a list of
  supplied machines where network connectivity to them is available.

3) It is already possible to setup machines operating an execd without
    needing shared filesystems created by a qmaster installtion.



This RFC suggests that:

1) It should be possible to do an automated execd installation
    locally, ie, on an exced seperate from a qmaster installtion,
    driven by the same user-supplied config file used for a qmaster
    installtion

2) It can be desirable to set up exced installtions without the shared
    filesystem components that typically expose the directory

    $SGE_ROOT/$SGE_CELL/common

    between the admin and compute nodes in a Grid Engine.

3) It can be desirable to set up exced installtions before the ability
    to do this from the qmaster installation is in place.



This RFC proposes:

1) A single, simple, one-word, "sine qua non" change to

$SGE_ROOT/util/install_modules/inst_common.sh

to facilitate automated execd installations.


2) A set of extra shell script functions, additions to both

$SGE_ROOT/util/install_modules/inst_common.sh
$SGE_ROOT/util/install_modules/inst_execd.sh

that have both informed the process so far and have seen an
automated execd installtion performed within a Grid Engine
8.1.3 deployment from RPMs.


3) Some modifications to

   $SGE_ROOT/inst_sge
   $SGE_ROOT/util/install_modules/inst_common.sh
   $SGE_ROOT/util/install_modules/inst_execd.sh

   so as to make calls to the extra shell script functions, and the
   invocation of the automated execd installation itself

4) Some modifications to CheckBinaries()  in

   $SGE_ROOT/util/install_modules/inst_common.sh

   to take account of the fact that only installing the RPMs needed
   for either a qmaster or an execd sees some binaries not installed
   that the CheckBinaries() function believes needs to be there

5) A possible moving/refactoring of some shell script functions,
    currently defined within

   $SGE_ROOT/util/install_modules/inst_qmaster.sh

   that become generic (rather common to both qmaster and execd installs)
   when both automated qmaster and execd installations can be performed.



This RFC recognises that:

1) the extra shell script functions could be placed into a new
    seperate install module file below

   $SGE_ROOT/util/install_modules/

   perhaps

   $SGE_ROOT/util/install_modules/inst_execd_auto.sh

   so as to minimse the alterations to existing files, until such time
   as any alterations are accepted, however that approach is yet to be
   tried.

2) the making generic of some currently qmaster-specific shell
   script functions might have been seen as a step too far and so
   the functionality of those scripts is currently duplicated within
   the changes.

3) the proof of concept, which the current changes demonstrate, lacks
   the ability to remove the effects of an automated execd install.

4) not everyone may need this proposed functionality but, if it can
   be achieved without affecting any current functionality of the Grid
   Engine's installtion process, then it seems worth making a request
   for comments within the community able to comment from an informed
   position.

5) Nothing has been done as yet to validate the proposed changes for
   use on a windows platform.


Proposal 1

Addition of a QMASTER_HOSTNAME variable to the list of variables that
a user-supplied config file may define.

The list to be altered is

KNOWN_CONFIG_FILE_ENTRIES_INSTALL

which is defined in the

CheckConfigFile()

function of

$SGE_ROOT/util/install_modules/inst_common.sh


Rationale

1) The only piece of information about a Grid Engine deployment that
   cannot currently be determined automatically, by a remote machine
   that is to deploy an execd, ahead of performing the qmaster
   installation, is the hostname of the qmaster machine itself.

2) This one change, on its own, is enough to allow for further/future
   development of approaches to automating execd installtions.

3) Defining a variable specifiying the hostname of the qmaster machine
   need not override the interactive inspection of the current
   hostname, which is the mechanism by which an automated qmaster
   installation currently obtains that value, and indeed, specifying
   one could be used as a check.

4) A failure to specifiy a variable for the hostname of the qmaster
   machine will not break the existing automated qmaster installation
   because of the interactive inspection to determine the current
   hostname.

5) The shell function

CheckConfigFile()

supplied in

$SGE_ROOT/util/install_modules/inst_common.sh

define a list of variables as

KNOWN_CONFIG_FILE_ENTRIES_INSTALL

and tests every config file variable that the user supplies, exiting
if the user-supplied config file defines a variable not in that list.

6) Changing the operation of CheckConfigFile() so that it can ignore
   variables unknown to it, or that are speficied as to be ignored,
   would be harder to implement and/or maintain, than simply adding
   a variable to an existing list.

7) The name suggested is QMASTER_HOSTNAME, because an existing
   variable, QMASTER_HOST, is already used within the various
   install_modules howeverm it is obtained from the file

   $SGE_ROOT/$SGE_CELL/common/act_qmaster

   a file that an automated execd installation needs to populate
   for itself, as opposed to relying on a qmaster install to have
   provided it.


Proposal 2

As many people as possible take a look at the patches to

   $SGE_ROOT/inst_sge
   $SGE_ROOT/util/install_modules/inst_common.sh
   $SGE_ROOT/util/install_modules/inst_execd.sh

that have allowed me to do the following and see what needs cleaning
up, and what might need adding to allow for use cases beyond mine,
to make them viable enough to ... errmmm ... be viable.

1) Install the 8.1.3 RPMs needed for an exced

  yum localinstall \
  gridengine-8.1.3-1.el6.x86_64.rpm \
  gridengine-execd-8.1.3-1.el6.x86_64.rpm

2) Put a local installtion config file into the SGE_ROOT

  cp inst_vuwscifachpc64.conf /opt/sge/

3) Path the three install files

   patch -Np1 -i /usr/local/src/GE-8.1.3-kmb/inst_sge-8.1.3.diff
   patch -Np1 -i /usr/local/src/GE-8.1.3-kmb/inst_common.sh-8.1.3.diff
   patch -Np1 -i /usr/local/src/GE-8.1.3-kmb/inst_execd.sh-8.1.3.diff

4) Automatically install an execd environment

   ./inst_sge -x -auto /opt/sge/inst_vuwscifachpc64.conf

5) Follow the same procedure to automatically install the qmaster
    from the same installtion config file on another box

6) Setup any networking and firewalling required

7) On the qmaster, add the pre-built execd node into @allhosts
    and make it a submit node

   qconf -mhgrp @allhosts
   qconf -as cent64-02.local

8) Start up the execd and see it join the cluster.



Rationale

How will anyone answer the Request For Comments if you don't
look at the patches !


NOTES

It seems like a good idea to provide some notes as to how I went
about this, so here they are.


1) Any new shell script function names to end in Execd

   This applies to modified versions of shell script functions called
   during a qmater install and any completely new functions.

   This was thought to be a better way make explict what appears to
   be needed than just modifying existing functions.

2) Identification of the path through the qmaster install functions
    suggested a new set of shell script functions for the execd:


 MakeDirsMaster                                     MakeDirsExecd
 SetSpoolingOptions           Not Needed
 AddBootstrap                                       AddBootstrapExecd
      PrintBootstrap                                PrintBootstrapExecd
 InitSpoolingDatabase         Not Needed
 AddConfiguration             Not Needed
 AddLocalConfiguration        Not Needed
 AddActQmaster                                      AddActQmasterExecd
 ProcessSGEClusterName qmaster                ProcessSGEClusterNameExecd execd
 AddDefaultComplexes          Not Needed
 AddPEFiles                   Not Needed
 AddDefaultUsersets           Not Needed
 AddCommonFiles                                     AddCommonFilesExecd
 AddJMXFiles                  Not Needed
 CreateSGEStartUpScripts $euid true master
 CreateSGEStartUpScripts $euid true execd
 CreateSettingsFile                                 CreateSettingsFileExecd


3) Where just a duplication of the functionality of qmaster install
   stuff was needed the relevant code is surrounded by a pair of
   comment lines starting with Dupl, eg
   (or in some cases Dupl mis-spelt as Dpul)

# Dupl    GetQmasterSpoolDir()
           QMDIR="$QMASTER_SPOOL_DIR"
           $INFOTEXT -log "Using >%s< as QMASTER_SPOOL_DIR." "$QMDIR"
# Dpul End


4) The CheckBinariesExecd() shell script fucntion teases out a bit
    more of the seperation of which binaries, that a vanilla Grid
    Engine install thinks need to be seen, might not be there
    depemding on which RPMs have been installed.

A past discussion on this suggested the idea of a WARNBINFILES list,
denoting binaries that, even though not there, would not prevent a
Grid Engine from operating.

The patch for that got applied to the original CheckBinaries()
function but was found not to be enough when only the two RPMs
needed for an execd to operate

gridengine-8.1.3-1.el6.x86_64.rpm
gridengine-execd-8.1.3-1.el6.x86_64.rpm

where installed.

Similarly, a machine which only has the two RPMs needed for a
qmaster to operate

gridengine-8.1.3-1.el6.x86_64.rpm
gridengine-qmaster-8.1.3-1.el6.x86_64.rpm

will find that it lacks some binaries that are provided in the execd.

5) The new PreInstallCheckExecd() function exists to allow for
   checking that using the CheckBinariesExecd() would work for a
   qmaster install and so could just replace CheckBinaries() if the
   distinction between which binaries are packaged into which RPMs
   is felt to be needed to be accounted for in the install process.


There are three patches attached that demostrate what I changed
and added to get to where I am - they contain a lot of "KMB" in the
comments and extra logging lines.

I don't claim this to be THE solution, but it seems to have worked for
me and, if there's felt to be a need to allow the Grid Engine to be
installed in this way, then maybe it could a start.


Kevin M. Buckley

eScience Consultant
School of Engineering and Computer Science
Victoria University of Wellington
New Zealand

Attachment: inst_sge-8.1.3-kmb02.diff
Description: Binary data

Attachment: inst_common.sh-8.1.3-kmb02.diff
Description: Binary data

Attachment: inst_execd.sh-8.1.3-kmb02.diff
Description: Binary data

_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Reply via email to