On Fri, 11 Dec 2009, Benoit Thiell wrote: > Marko Niinimaki wrote: >> I another project, I found the following practice really useful: >> -there is a database table called schema_version (or equivalent) that >> contains a version number >> -every "make install" checks the version number and upgrades the table >> structure if needed >> -every git submit that changes the database increases the version number > > That's approximately what I had in mind. Last week-end I wrote an > implementation proposal as a Python script that checks the database and > Invenio versions in the way that you described and then returns the list > of updates to apply in order to upgrade the database. This week-end I > will do some testing on my code and send it out on Monday.
Note sure what your script is doing, but as I said last time, it is not only DB structure that is in the game: the DB structure may be the same but the content may be different (think Python serialized objects); some non-trivial migration scripts may be needed to run; or some new default values (e.g. from tabfill) may be needed to get inserted, etc. It is mostly we humans who must be maintaining this knowledge and any needed update statements. There is no need for any automated sophisticated SQL deduction scripts and CREATE TABLE comparisons; I think we can simply move maintaining this knowledge from Makefile update statements and from RELEASE-NOTES files into the new inveniocfg options, if we want. Concerning Marko's proposal, we kind-of sort-of use its variant in a sense, if we consider that our DB schema numbers are exactly identical to the Invenio release numbers. The history shows that we were not releasing frequently enough, so perhaps we should indeed adopt such a technique, where DB versions are separate and can evolve more rapidly than release versions. Though, a bleeding edge is a bleeding edge, where drastic changes are expected, so still dunno if it is worth to write some update statements for unstable periods of development and then rewrite them back later in case of changes (that are doomed to occur in bleeding edge). Plus it is not only a question of DB, it is also a question of `etc' file formats that is in the game. Hence I was looking at a common db+etc freeze solution by using those dedicated git branches v1.0, v1.1 that I mentioned previously. If we go for the parallel branch option, then those extra DB schema numbers would not be needed. Though they may be nice to have still, if we want that DB can evolve more quickly... Dunno. For example, if we introduce separate numbers for DB schema, it would be nice to introduce separate numbers for the etc file formats, for the lib plug-in formats, etc. All in all, I think that the branch technique may help us in tackling all these issues at once via a single branch number, so this option is still what I prefer, I guess. It would mean having one git/master bleeding edge branch plus a set of highly-stable or semi-stable v1.0, v1.1, v1.2 branches. But we should be really releasing early, releasing often, if we want to make an effective use of such a git branching technique. (brainstorming out loud, keep it going) Best regards -- Tibor Simko ** CERN Document Server ** <http://cds.cern.ch/>
