Hi Benoit:

On Tue, 20 Apr 2010, Benoit Thiell wrote:
> I am currently working on a small script that will update the record
> structures in the database. I am wondering about the keys in the table
> bibfmt. I think that it would make sense to ditch the keys on the
> columns 'id_bibrec' and 'format' and introduce a unique composite key
> (id_bibrec, format).

Yes, definitely.

1) One thing to keep in mind though is that some client code might
search in bibfmt table in a non-composite-index-friendly way, which
would result in very slow queries if we delete separate indexes without
offering an alternative.  So we may perhaps introduce a primary index as
you described, plus also an additional index for the left-out column if
it is used in stand-alone-column SQL queries.  Otherwise we can simply
check every bibfmt SQL to confirm that it uses composite index column
order properly.  See for example
<http://dev.mysql.com/doc/refman/4.1/en/multiple-column-indexes.html>

2) It may be interesting to check how big the DB table index files are
before and after the change.  E.g. for the INSPIRE site, we currently
have:

   2.5G bibfmt.MYD
    65M bibfmt.MYI

which is perfectly reasonable.  In the ADS case, the bibfmt table may be
huge, and if the MYI index file is huge too, it may be perhaps
interesting to investigate options like (i) splitting bibfmt table into
several bibfmt sub-tables, one for every format.  Though increasing the
number of tables is not ideal either, so dunno about usefulness of this
option, it may not be worth it.  (ii) If MYD/MYI files are not too huge
still, then we can eventually think of non-compressing BLOB values, as
we have spoken about it in Boston with respect to the citation
dictionary.  Perhaps the non-compressed size will remain reasonable, and
we shall gain on the compress/uncompress CPU time.  I guess your MYD
file is huge already, especially if you cache recstruct too.  But maybe
we can use several bibfmt tables that are uncompressed, so a mix of (i)
and (ii).  Just thinking of taking this opportunity to do some more
bibfmt-related optimization studies and changes.

3) Also, the ALTER TABLE statements for the update-v0.99.1-tables target
should behave very safely in case there are dupes, e.g. telling user
what to do if ALTER TABLE fails (=remove conflicting rows, recheck
bibxxx values, and rerun bibreformat and/or XM re-generation).  Hmm,
maybe you can just do it automatically, since we advise people to take
backup before upgrading to v1.0 anyway, so maybe fixing automatically
all the vulnerable recIDs will do (while reporting them to the user).
If it is not too practical to do this inside update-v0.99.1-tables
Makefile due to limited LOC space, then please write a little migration
kit script, see all the various *_migration_kit.py files we have been
providing.

Best regards
--
Tibor Simko

Reply via email to