On Wed, 11 Aug 2010, Joe Blaylock wrote: > So: The proposed INSPIRE deployment model: > http://invenio-software.org/wiki/INSPIRE/DeploymentModel > > Discuss!
In summary, I see three kinds of issues with this proposal: while it describes well a development collaboration practice in multi-person environment, it kind of mixes to some degree the development and production deployment, the pull-on-demand and shared-push collaboration model, the invenio and inspire repo history. Let me illustrate some of these points, and let us discuss these and other related ones IRL. - Firstly, it is obvious that the current situation (using master branch only) is not ideal for progressive INSPIRE instance updating. I have outlined my picture in emails like <[email protected]> of 14 Jan 2010 and in a few other emails related to branching; see also: <https://twiki.cern.ch/twiki/bin/view/CDS/GitWorkflow#Understanding_official_repo_bran> <http://simko.info/tmp/invenio-technology-openlab-aug-2010.pdf> Briefly, there are (will be) branches like: - maint-0.99 - maint-1.0 - maint-1.1 - master - next The TEST and PROD boxes should be run out of a `maint-X.Y' branch, never of a `master' branch. (Which spins off a question when we shall have the long overdue maint-1.0 branch, but let me not digress.) There are basically no conf/DB updates on a given maint-X.Y branch, so deployment is ultra easy. To deploy code on a TEST/PROD machine is almost as easy as a post-commit hook, modulo any running bibsched tasks one has to pay attention to. It may be sufficient to have a site software upgrade job running once per day. The DEV box is usually run out of master and so may have non-trivial DB/conf update issues. It differs from TEST/PROD in this regard, and in its bleeding edge nature as well. Yet the proposal speaks of deployment strategy for DEV/TEST/PROD in similar way in places, which is the first mix I alluded to in the introduction. While we could deploy johndoe/feature-a on DEV to see how it behaves, we should not do it on TEST/PROD if it is not ready in one of maint-X.Y branches. (Yes, refersto/citedby, I know.) - CDS, INSPIRE, ADS, and other instance-specific repos contain branches following the given Invenio branches. The INSPIRE repo codebase does not usually clash with Invenio repo codebase: it mostly contains the UI skin on top of Invenio anyway. I don't consider installing it after Invenio as monkey-patching. There is no overwriting to be done apart from some etc/HB.bfo file or two, since any INSPIRE specific template files are living apart. (If some BFE elements are not, then they should; e.g. bfe_references.py -> bfe_INSPIRE_references.py.) Again, TEST/PROD are ultra safe in this area, DEV may need some adapting. Moreover, INSPIRE features that are of no interest to generic Invenio instances, but that touch Invenio core, are currently living in Invenio as inactivated, via CFG_INSPIRE_SITE. Ditto for ADS. Anything can be put there, anything can live inside or be called from inside, since it is properly protected via CFG variables. (Even a statement like `if CFG_FOO_SITE: print 1/0' would be OK, if an instance needs it.) Since this does not have an impact on Invenio core, it does not enter into usual Invenio integration matters. (For INSPIRE, it would enter INSPIRE integration matters though.) - Your proposal is using heavily developers' public repo branches, but these are usually not fully consumable. Recall how many iterations we have done for the author name tokenizer. One can merge the branches, check if there is stuff to change, fix and re-merge again and again; but this may lead to `interesting' repo history with many intermediate commits. Such a history can be pretty straightforward, which would then simulate the lieutenant integration practice we currently use (see below). But it may also get complex, leading to problems: e.g. you mention that I would cherry-pick stuff afterwards from these branches, but it may not be possible when a branch has some complex merge history with complex merge conflict resolutions behind. Why do I care about repo history? To ease the job for people hunting Invenio bugs in the future. A clean commit history, where people can rely on individual feature branch commit nodes, leads to much easier bug bisecting later. If one relies on developers' public branches, doing frequent merging as things are updated, one basically shifts the collaboration model from pull-on-demand to shared-push, to a certain degree. This is the second mix. One can as well ask why not to use shared-push in that case. - Currently, the Invenio modules have usually very well defined `maintainers' or `responsible developers'. People rarely have to do git merge johndoe/really-cool-thing-c for their work, to pick your example. If they do, we currently use the `integration lieutenant' approach, similar to the Linux kernel. This suits well the pull-on-demand collaboration model. To illustrate it on two recent cases: - Marko developed an auto-suggestion facility for BibEdit via BibKnowledge. He pushed a branch out. Piotr, as the maintainer of BibEdit, took it, integrated it into his latest work, solved some conflicts and otherwise modified it as he saw fit, and pushed the branch out. I then integrated it without looking much at Marko's original branch, because Piotr served as an `integration lieutenant' in this case. - Jan L. was to enrich BibMatch with remote instance matching capabilities. Jerome mentioned his almost-ready InvenioConnector utility. Jan took it and included it in his branch, which was then integrated. Jan served as an `integration lieutenant' in this case. Since the branch was taken `as is', i.e. merged and not cherry-picked, Jan's name appears as a committer for this Jerome-authored code. (git show d8b5e5b --format=full) Since we have usually well-defined module maintainers or `areas of expertise' in the project, it is rarely needed for people to merge many branches from many fellow co-developers in parallel. If we needed that, it would call for a collaboration model shifted more towards shared-push. Since we typically don't, the pull-on-demand model is quite appropriate. (Pull-on-demand has its advantages over shared-push that I won't discuss here, but that I mentioned elsewhere, e.g. <https://savannah.cern.ch/task/?13992#comment6>.) - Finally, if INSPIRE repo contained Invenio repo as a branch, chances are that after some complex back and forth merging of developer public branches, especially if core-touching features are not developed fully inside Invenio via CFG_INSPIRE_SITE, but in an INSPIRE feature branches living next to it, as you seem to suggest in the proposal, then the two development histories risk of getting inter-mixed sooner or later. It may be then very hard to cherry-pick or merge away things out of such repos. Using git submodules may perhaps be more appropriate then, if we dislike the current two separated repos with closely-followed branches. In any case, one would not want Invenio core repo to depend on a commit history of an INSPIRE or an ADS specific repo, which is something that could happen more easily with branches than with modules. This is the third concern I mentioned in the introduction, but this email is getting long already, so let me stop here. To sum up: I think your proposal describes well one DEV technique for people to collaborate together, especially when more developers work on the same feature/module. (It kind of simulates what `integration lieutenants' are doing behind the scenes already now, with the current collaboration model, on their own personal DEV servers.) But I'm not sure it would be advantageous to instantiate this scheme on such a project-wide scale as to mutually rely on people's topic branches, for reasons mentioned above. The proposal seems to try to find a compromise between shared-push and pull-on-demand collaboration model, but it risks to lead to a hybrid model that may be unnecessarily more complex than the two models taken alone. Especially since people rarely need to pull from others when developing features. Which is why I'd rather look at ways how to improve the current integration practice while staying with the pull-on-demand model, which I think is more advantageous for our project. Basically, I think we should speed up having the maint branches for INSPIRE TEST/PROD instances to boot, which would lead to faster bugfix/feature deployment, leaving room for faster integration lieutenant processing. But let us discuss pros/cons of these various solutions IRL. Best regards -- Tibor Simko
