Hi Benoit: On Tue, 20 Apr 2010, Benoit Thiell wrote: > I am currently working on a small script that will update the record > structures in the database. I am wondering about the keys in the table > bibfmt. I think that it would make sense to ditch the keys on the > columns 'id_bibrec' and 'format' and introduce a unique composite key > (id_bibrec, format).
Yes, definitely. 1) One thing to keep in mind though is that some client code might search in bibfmt table in a non-composite-index-friendly way, which would result in very slow queries if we delete separate indexes without offering an alternative. So we may perhaps introduce a primary index as you described, plus also an additional index for the left-out column if it is used in stand-alone-column SQL queries. Otherwise we can simply check every bibfmt SQL to confirm that it uses composite index column order properly. See for example <http://dev.mysql.com/doc/refman/4.1/en/multiple-column-indexes.html> 2) It may be interesting to check how big the DB table index files are before and after the change. E.g. for the INSPIRE site, we currently have: 2.5G bibfmt.MYD 65M bibfmt.MYI which is perfectly reasonable. In the ADS case, the bibfmt table may be huge, and if the MYI index file is huge too, it may be perhaps interesting to investigate options like (i) splitting bibfmt table into several bibfmt sub-tables, one for every format. Though increasing the number of tables is not ideal either, so dunno about usefulness of this option, it may not be worth it. (ii) If MYD/MYI files are not too huge still, then we can eventually think of non-compressing BLOB values, as we have spoken about it in Boston with respect to the citation dictionary. Perhaps the non-compressed size will remain reasonable, and we shall gain on the compress/uncompress CPU time. I guess your MYD file is huge already, especially if you cache recstruct too. But maybe we can use several bibfmt tables that are uncompressed, so a mix of (i) and (ii). Just thinking of taking this opportunity to do some more bibfmt-related optimization studies and changes. 3) Also, the ALTER TABLE statements for the update-v0.99.1-tables target should behave very safely in case there are dupes, e.g. telling user what to do if ALTER TABLE fails (=remove conflicting rows, recheck bibxxx values, and rerun bibreformat and/or XM re-generation). Hmm, maybe you can just do it automatically, since we advise people to take backup before upgrading to v1.0 anyway, so maybe fixing automatically all the vulnerable recIDs will do (while reporting them to the user). If it is not too practical to do this inside update-v0.99.1-tables Makefile due to limited LOC space, then please write a little migration kit script, see all the various *_migration_kit.py files we have been providing. Best regards -- Tibor Simko
