Hi Benoit: On Thu, 10 Dec 2009, Benoit Thiell wrote: > Additions to the latest Makefile update script (currently v0.99.1) are > added regularly but differences between the code and the MySQL table > updating scripts remain
Yes, this is because ``make update-v0.x.y-tables'' commands are meant for stable-to-stable version updates only, and as such these Makefile targets are usually checked only before a stable release is due. They are not meant for git/master updates in between releases, which is more of a bleeding edge thing. E.g. sometimes reindexing is needed, sometimes migration kits are needed, and I'm not sure if it is worth to keep maintaining these targets for any moment in time during the unstable period. What is often done though is that configure.ac development version number (YYYYMMDD) is bumped in case of significant DB table changes or updates to the file organization. So one can in principle judge by checking out if the dev version number in configure.ac has changed, and then diff tabcreate.sql between the two given bleeding edge dates. That was a rough description of the current practice. As for how to improve it: > * Reject all the commits that touches the database structure without > updating the Makefile. This may not be practical. What if some changes are taken back partially; some conditionals would have to be maintained for people that have done git/master updates prior to YYYYMMDD1 but not later than YYYYMMDD2, etc. > * Using the DROP TABLE statement in the update scripts should be > allowed only if the table to remove will actually disappear from the > system (c.f. update-v0.92.1-tables). Table definitions should be > updated and not recreated. People might have useful information in > there. That DROP TABLE statement concerned only internal indexed ranked data structure. It was necessary to drop it because of a change in the citation handling. The RELEASE-NOTES instructions advise people about a need to rerun indexing in case of such needs. So no valuable data is lost in this case... just an internal structure that is regenerated as needed as part of the upgrade. > * Create an automated database updating system that would rely on an > internal database version number. Yes, that would be nice indeed. But it may not be practical to maintain this fully for all the in-between-stable-releases periods anyway (see above). Which brings me to an analogous alternative: * One of the roots of these problems is that we allow a long time in between releases, accumulating non-trivial DB structure updates. We should rather ``release early, release often''. After moving from CVS to Git, we can easily maintain a plethora of branches, making this possible. Every branch (v1.0, v1.1, v1.2) would have a rock-solid DB and etc structure, meaning simple update procedures for the clients: not only from the point of view of the DB, but *also* for various `etc' and `conf' files. This was the original plan behind <https://twiki.cern.ch/twiki/bin/view/CDS/GitWorkflow#Understanding_official_repo_bran> which was to start after v1.0 was out. With something like a periodical monthly maintenance releases for every branch, that would be easy, quick and safe to deploy because of their strict bugfix-only nature. (New features going into new minor version branches <http://invenio-demo.cern.ch/help/hacking/release-numbering>.) ... but, of course, it took more time than we had hoped to drift towards v1.0, so this scheme is unfortunately not deployed yet... P.S. I hope you have not lost stuff. Are there some problems on your site still? P.S. BTW, speaking of table updates, I think 7M of records necessitated to change columns in bibrec/bibrec_bibxxx/bibxxx tables from MEDIUMINT to INT (or maybe BIGINT but probably not). This is something that was not committed to git/master yet. So please check it out with Giovanni. It would be good to create ``inveniocfg --reset-bibxxx-mediumint'' commands that would do the necessary job of the column type altering. Just for necessary bibxxx related tables, not everywhere, because BIGINT eats 8 bytes, while INT only 4 bytes and MEDIUMINT 3 bytes. So we have interest to keep these numbers small by default everywhere. And a given Invenio installation could use this inveniocfg command to choose between INT/MEDIUMINT/BIGINT based on how many records it has. Best regards -- Tibor Simko ** CERN Document Server ** <http://cds.cern.ch/>
