Re: Why running the Git version of Invenio on a non-Atlantis collection might lead to problems.

Tibor Simko Thu, 10 Dec 2009 22:38:17 +0100

Hi Benoit:

On Thu, 10 Dec 2009, Benoit Thiell wrote:
> Additions to the latest Makefile update script (currently v0.99.1) are
> added regularly but differences between the code and the MySQL table
> updating scripts remain


Yes, this is because ``make update-v0.x.y-tables'' commands are meant
for stable-to-stable version updates only, and as such these Makefile
targets are usually checked only before a stable release is due.  They
are not meant for git/master updates in between releases, which is more
of a bleeding edge thing.  E.g. sometimes reindexing is needed,
sometimes migration kits are needed, and I'm not sure if it is worth to
keep maintaining these targets for any moment in time during the
unstable period.

What is often done though is that configure.ac development version
number (YYYYMMDD) is bumped in case of significant DB table changes or
updates to the file organization.  So one can in principle judge by
checking out if the dev version number in configure.ac has changed, and
then diff tabcreate.sql between the two given bleeding edge dates.

That was a rough description of the current practice.  As for how to
improve it:

>   * Reject all the commits that touches the database structure without
> updating the Makefile.

This may not be practical.  What if some changes are taken back
partially; some conditionals would have to be maintained for people that
have done git/master updates prior to YYYYMMDD1 but not later than
YYYYMMDD2, etc.

>   * Using the DROP TABLE statement in the update scripts should be
> allowed only if the table to remove will actually disappear from the
> system (c.f. update-v0.92.1-tables). Table definitions should be
> updated and not recreated. People might have useful information in
> there.

That DROP TABLE statement concerned only internal indexed ranked data
structure.  It was necessary to drop it because of a change in the
citation handling.  The RELEASE-NOTES instructions advise people about a
need to rerun indexing in case of such needs.  So no valuable data is
lost in this case... just an internal structure that is regenerated as
needed as part of the upgrade.

>   * Create an automated database updating system that would rely on an
> internal database version number.

Yes, that would be nice indeed.  But it may not be practical to maintain
this fully for all the in-between-stable-releases periods anyway (see
above).  

Which brings me to an analogous alternative:

* One of the roots of these problems is that we allow a long time in
  between releases, accumulating non-trivial DB structure updates.  We
  should rather ``release early, release often''.  After moving from CVS
  to Git, we can easily maintain a plethora of branches, making this
  possible.  Every branch (v1.0, v1.1, v1.2) would have a rock-solid DB
  and etc structure, meaning simple update procedures for the clients:
  not only from the point of view of the DB, but *also* for various
  `etc' and `conf' files.  This was the original plan behind
  
<https://twiki.cern.ch/twiki/bin/view/CDS/GitWorkflow#Understanding_official_repo_bran>
  which was to start after v1.0 was out.  With something like a
  periodical monthly maintenance releases for every branch, that would
  be easy, quick and safe to deploy because of their strict bugfix-only
  nature.  (New features going into new minor version branches
  <http://invenio-demo.cern.ch/help/hacking/release-numbering>.)

  ... but, of course, it took more time than we had hoped to drift
  towards v1.0, so this scheme is unfortunately not deployed yet...

P.S. I hope you have not lost stuff.  Are there some problems on your
     site still?

P.S. BTW, speaking of table updates, I think 7M of records necessitated
     to change columns in bibrec/bibrec_bibxxx/bibxxx tables from
     MEDIUMINT to INT (or maybe BIGINT but probably not).  This is
     something that was not committed to git/master yet.  So please
     check it out with Giovanni.  It would be good to create
     ``inveniocfg --reset-bibxxx-mediumint'' commands that would do the
     necessary job of the column type altering.  Just for necessary
     bibxxx related tables, not everywhere, because BIGINT eats 8 bytes,
     while INT only 4 bytes and MEDIUMINT 3 bytes.  So we have interest
     to keep these numbers small by default everywhere.  And a given
     Invenio installation could use this inveniocfg command to choose
     between INT/MEDIUMINT/BIGINT based on how many records it has.

Best regards
-- 
Tibor Simko ** CERN Document Server ** <http://cds.cern.ch/>

Re: Why running the Git version of Invenio on a non-Atlantis collection might lead to problems.

Reply via email to