On Tue, 12 Nov 2013, [email protected] wrote:
> I'm currently experiencing VERY slow bibindex times (~2-3min/rec) for
> itemcount and filetype indexes[1] and records that have many authors
> (for example Atlas related experiments). I'm running the latest master.
> Have you noticed it too at CERN?
Nope. On my laptop, using Invenio demo site, when I recreate itemcount
and filetype indexes, the whole process takes about 2 seconds per index.
For 141 demo records. This means an indexing speed of about 4k records
per minute.
> I could always re-enable CFG_BIBUPLOAD_SERIALIZE_RECORD_STRUCTURE and
> maybe speed things up a litte, but I was trying to save some space in
> the DB :)
Aha! Without serialisation, the record structure and the virtual fields
have to be recreated all the time on the fly, hence a lot of slowness.
I'd strongly advise you to revert the settings back to its default. The
disk space savings should not be too considerable anyway. Having record
structure pre-cached will help in many other parts of Invenio as well.
> Is there anything that I could do to make it run a bit faster?
We can introduce different record field configurations per different
record types, e.g. the itemcount index could be activated only for
books. However, the pre-caching of record structure would still be very
desirable.
* * *
Still, a speed of two minutes per record seems excessively slow. Should
not be entirely due to pre-caching, I guess.
I've just made an experiment by suppressing record cache and re-indexing
filetype. Got an exception that we should fix :) However, when forcing
reset_cache=True on bibfield.get_record(), the speed I got was about 40
seconds to re-index 141 demo records. This gives a speed of about 210
records per minute. As expected, this is much slower (20x) than with
the record cache on. But it is still much, much faster (400x) than the
value of 0.5 records per minute that you observed?!
Can you please force-index one record using profiling? E.g.
$ sudo -u www-data /opt/invenio/bin/bibindex -u admin \
-w filetype -a -i 10 --profile=cumulative
Then let's see in `/opt/invenio/var/log/bibsched_task_123.log' where the
bottleneck is...
Best regards
--
Tibor Simko