On Fri, 16 Mar 2012, Giovanni Di Milia wrote:
> I also checked the database where we are uploading all 10M records and
> right now, after almost 6M records uploaded, the same files are 1.4G
> for the MYD and 1.2G for the MYI.

Thanks for your tests.  Seems perfectly reasonable given your instance
size and your available RAM size.

For MARC field values that vary considerably from the beginning such as
titles, or for values that are short such as authors, it does not matter
too much how far we increase the limit globally.  For big tables it may
give some increase; e.g. for the abstract field (bib52x) on the CDS
instance of Invenio, indexing 35 vs 85 leading characters raised the
index size from 47M to 84M, but this is totally acceptable.  The total
size of all bib[0-9][0-9]x.MYI indexes for the CDS instance is 427 MB;
for the INSPIRE instance it is 597M.  If we increase the indexing to say
100 leading characters, the indexes may go up to 800M or thereabouts, I
would guess, which is totally acceptable.

So I'd say let's increase the limit for all bibxxx tables globally
indeed, from 35 to say 100, which should give us better-prepared Invenio
defaults for an instance with `unknown' situations of any generic value
list going into any generic MARC tags.

BTW, you seem to have a limit of 200 for bib99x now.  I think your URLs
may all fit well into 100, isn't it?  Can you please do one more test:

   CREATE TABLE test_bib99x LIKE bib99x;
   ALTER TABLE test_bib99x DROP INDEX kv, ADD INDEX kv (value(100));
   INSERT INTO test_bib99x SELECT * FROM bib99x;
   OPTIMIZE TABLE bib99x;
   OPTIMIZE TABLE test_bib99x;

and check sizes of `bib99x.MYI' and `test_bib99x.MYI' tables?  (And
perhaps also check the querying/insertion speed, if time permits.)

(E.g. on the INSPIRE TEST instance having 2M rows in bib99x, going from
35 to 100 for references increased bib99x.MYI from 50M to 53M only; but
we don't store there big similar URLs like you do.)

Summa summarum, I'll commit a global change to have the new kv index
default value of 100 everywhere.

Best regards
--
Tibor Simko

Reply via email to