Re: RFC: The proposed INSPIRE deployment model

Tibor Simko Mon, 6 Sep 2010 02:30:22 +0200

On Wed, 11 Aug 2010, Joe Blaylock wrote:
> So: The proposed INSPIRE deployment model:
> http://invenio-software.org/wiki/INSPIRE/DeploymentModel
>
> Discuss!


In summary, I see three kinds of issues with this proposal: while it
describes well a development collaboration practice in multi-person
environment, it kind of mixes to some degree the development and
production deployment, the pull-on-demand and shared-push collaboration
model, the invenio and inspire repo history.  Let me illustrate some of
these points, and let us discuss these and other related ones IRL.

- Firstly, it is obvious that the current situation (using master branch
  only) is not ideal for progressive INSPIRE instance updating.  I have
  outlined my picture in emails like <[email protected]> of
  14 Jan 2010 and in a few other emails related to branching; see also:
  
<https://twiki.cern.ch/twiki/bin/view/CDS/GitWorkflow#Understanding_official_repo_bran>
  <http://simko.info/tmp/invenio-technology-openlab-aug-2010.pdf>

  Briefly, there are (will be) branches like:

    - maint-0.99
    - maint-1.0
    - maint-1.1
    - master
    - next

  The TEST and PROD boxes should be run out of a `maint-X.Y' branch,
  never of a `master' branch.  (Which spins off a question when we shall
  have the long overdue maint-1.0 branch, but let me not digress.)

  There are basically no conf/DB updates on a given maint-X.Y branch, so
  deployment is ultra easy.  To deploy code on a TEST/PROD machine is
  almost as easy as a post-commit hook, modulo any running bibsched
  tasks one has to pay attention to.  It may be sufficient to have a
  site software upgrade job running once per day.

  The DEV box is usually run out of master and so may have non-trivial
  DB/conf update issues.  It differs from TEST/PROD in this regard, and
  in its bleeding edge nature as well.  Yet the proposal speaks of
  deployment strategy for DEV/TEST/PROD in similar way in places, which
  is the first mix I alluded to in the introduction.  While we could
  deploy johndoe/feature-a on DEV to see how it behaves, we should not
  do it on TEST/PROD if it is not ready in one of maint-X.Y branches.
  (Yes, refersto/citedby, I know.)

- CDS, INSPIRE, ADS, and other instance-specific repos contain branches
  following the given Invenio branches.  The INSPIRE repo codebase does
  not usually clash with Invenio repo codebase: it mostly contains the
  UI skin on top of Invenio anyway.  I don't consider installing it
  after Invenio as monkey-patching.  There is no overwriting to be done
  apart from some etc/HB.bfo file or two, since any INSPIRE specific
  template files are living apart.  (If some BFE elements are not, then
  they should; e.g. bfe_references.py -> bfe_INSPIRE_references.py.)
  Again, TEST/PROD are ultra safe in this area, DEV may need some
  adapting.

  Moreover, INSPIRE features that are of no interest to generic Invenio
  instances, but that touch Invenio core, are currently living in
  Invenio as inactivated, via CFG_INSPIRE_SITE.  Ditto for ADS.
  Anything can be put there, anything can live inside or be called from
  inside, since it is properly protected via CFG variables.  (Even a
  statement like `if CFG_FOO_SITE: print 1/0' would be OK, if an
  instance needs it.)  Since this does not have an impact on Invenio
  core, it does not enter into usual Invenio integration matters.  (For
  INSPIRE, it would enter INSPIRE integration matters though.)

- Your proposal is using heavily developers' public repo branches, but
  these are usually not fully consumable.  Recall how many iterations we
  have done for the author name tokenizer.  One can merge the branches,
  check if there is stuff to change, fix and re-merge again and again;
  but this may lead to `interesting' repo history with many intermediate
  commits.  Such a history can be pretty straightforward, which would
  then simulate the lieutenant integration practice we currently use
  (see below).  But it may also get complex, leading to problems:
  e.g. you mention that I would cherry-pick stuff afterwards from these
  branches, but it may not be possible when a branch has some complex
  merge history with complex merge conflict resolutions behind.  Why do
  I care about repo history?  To ease the job for people hunting Invenio
  bugs in the future.  A clean commit history, where people can rely on
  individual feature branch commit nodes, leads to much easier bug
  bisecting later.  If one relies on developers' public branches, doing
  frequent merging as things are updated, one basically shifts the
  collaboration model from pull-on-demand to shared-push, to a certain
  degree.  This is the second mix.  One can as well ask why not to use
  shared-push in that case.

- Currently, the Invenio modules have usually very well defined
  `maintainers' or `responsible developers'.  People rarely have to do
  git merge johndoe/really-cool-thing-c for their work, to pick your
  example.  If they do, we currently use the `integration lieutenant'
  approach, similar to the Linux kernel.  This suits well the
  pull-on-demand collaboration model.  To illustrate it on two recent
  cases:

   - Marko developed an auto-suggestion facility for BibEdit via
     BibKnowledge.  He pushed a branch out.  Piotr, as the maintainer of
     BibEdit, took it, integrated it into his latest work, solved some
     conflicts and otherwise modified it as he saw fit, and pushed the
     branch out.  I then integrated it without looking much at Marko's
     original branch, because Piotr served as an `integration
     lieutenant' in this case.

   - Jan L. was to enrich BibMatch with remote instance matching
     capabilities.  Jerome mentioned his almost-ready InvenioConnector
     utility.  Jan took it and included it in his branch, which was then
     integrated.  Jan served as an `integration lieutenant' in this
     case.  Since the branch was taken `as is', i.e. merged and not
     cherry-picked, Jan's name appears as a committer for this
     Jerome-authored code. (git show d8b5e5b --format=full)

  Since we have usually well-defined module maintainers or `areas of
  expertise' in the project, it is rarely needed for people to merge
  many branches from many fellow co-developers in parallel.  If we
  needed that, it would call for a collaboration model shifted more
  towards shared-push.  Since we typically don't, the pull-on-demand
  model is quite appropriate.  (Pull-on-demand has its advantages over
  shared-push that I won't discuss here, but that I mentioned elsewhere,
  e.g. <https://savannah.cern.ch/task/?13992#comment6>.)

- Finally, if INSPIRE repo contained Invenio repo as a branch, chances
  are that after some complex back and forth merging of developer public
  branches, especially if core-touching features are not developed fully
  inside Invenio via CFG_INSPIRE_SITE, but in an INSPIRE feature
  branches living next to it, as you seem to suggest in the proposal,
  then the two development histories risk of getting inter-mixed sooner
  or later.  It may be then very hard to cherry-pick or merge away
  things out of such repos.  Using git submodules may perhaps be more
  appropriate then, if we dislike the current two separated repos with
  closely-followed branches.  In any case, one would not want Invenio
  core repo to depend on a commit history of an INSPIRE or an ADS
  specific repo, which is something that could happen more easily with
  branches than with modules.  This is the third concern I mentioned in
  the introduction, but this email is getting long already, so let me
  stop here.

To sum up: I think your proposal describes well one DEV technique for
people to collaborate together, especially when more developers work on
the same feature/module.  (It kind of simulates what `integration
lieutenants' are doing behind the scenes already now, with the current
collaboration model, on their own personal DEV servers.)  But I'm not
sure it would be advantageous to instantiate this scheme on such a
project-wide scale as to mutually rely on people's topic branches, for
reasons mentioned above.  The proposal seems to try to find a compromise
between shared-push and pull-on-demand collaboration model, but it risks
to lead to a hybrid model that may be unnecessarily more complex than
the two models taken alone.  Especially since people rarely need to pull
from others when developing features.  Which is why I'd rather look at
ways how to improve the current integration practice while staying with
the pull-on-demand model, which I think is more advantageous for our
project.  Basically, I think we should speed up having the maint
branches for INSPIRE TEST/PROD instances to boot, which would lead to
faster bugfix/feature deployment, leaving room for faster integration
lieutenant processing.  But let us discuss pros/cons of these various
solutions IRL.

Best regards
-- 
Tibor Simko

Re: RFC: The proposed INSPIRE deployment model

Reply via email to