On Fri, 11 Dec 2009, Benoit Thiell wrote:
> Marko Niinimaki wrote:
>> I another project, I found the following practice really useful:
>> -there is a database table called schema_version (or equivalent) that
>> contains a version number
>> -every "make install" checks the version number and upgrades the table
>> structure if needed
>> -every git submit that changes the database increases the version number
>
> That's approximately what I had in mind. Last week-end I wrote an 
> implementation proposal as a Python script that checks the database and 
> Invenio versions in the way that you described and then returns the list 
> of updates to apply in order to upgrade the database. This week-end I 
> will do some testing on my code and send it out on Monday.

Note sure what your script is doing, but as I said last time, it is not
only DB structure that is in the game: the DB structure may be the same
but the content may be different (think Python serialized objects); some
non-trivial migration scripts may be needed to run; or some new default
values (e.g. from tabfill) may be needed to get inserted, etc.  It is
mostly we humans who must be maintaining this knowledge and any needed
update statements.  There is no need for any automated sophisticated SQL
deduction scripts and CREATE TABLE comparisons; I think we can simply
move maintaining this knowledge from Makefile update statements and from
RELEASE-NOTES files into the new inveniocfg options, if we want.

Concerning Marko's proposal, we kind-of sort-of use its variant in a
sense, if we consider that our DB schema numbers are exactly identical
to the Invenio release numbers.  The history shows that we were not
releasing frequently enough, so perhaps we should indeed adopt such a
technique, where DB versions are separate and can evolve more rapidly
than release versions.  Though, a bleeding edge is a bleeding edge,
where drastic changes are expected, so still dunno if it is worth to
write some update statements for unstable periods of development and
then rewrite them back later in case of changes (that are doomed to
occur in bleeding edge).  Plus it is not only a question of DB, it is
also a question of `etc' file formats that is in the game.  Hence I was
looking at a common db+etc freeze solution by using those dedicated git
branches v1.0, v1.1 that I mentioned previously.  If we go for the
parallel branch option, then those extra DB schema numbers would not be
needed.  Though they may be nice to have still, if we want that DB can
evolve more quickly... Dunno.  For example, if we introduce separate
numbers for DB schema, it would be nice to introduce separate numbers
for the etc file formats, for the lib plug-in formats, etc.  All in all,
I think that the branch technique may help us in tackling all these
issues at once via a single branch number, so this option is still what
I prefer, I guess.  It would mean having one git/master bleeding edge
branch plus a set of highly-stable or semi-stable v1.0, v1.1, v1.2
branches.  But we should be really releasing early, releasing often, if
we want to make an effective use of such a git branching technique.

(brainstorming out loud, keep it going)

Best regards
-- 
Tibor Simko ** CERN Document Server ** <http://cds.cern.ch/>

Reply via email to