#49: Bibclassify slow - cache loading
-------------------------+--------------------------------------------------
Reporter: rchyla | Owner: rchyla
Type: defect | Status: new
Priority: major | Milestone:
Component: BibClassify | Version:
Keywords: |
-------------------------+--------------------------------------------------
Bibclassify, when loading the cache, is very slow
Cache contains regexes (and many of them), Python recompiles the regexes
during load: http://stackoverflow.com/questions/65266/caching-compiled-
regex-objects-in-python
Since we have much bigger taxonomy now, I need to investigate ways to:
- make it smaller (perhaps share regexes among Keyword object)
- import only once (and share among threads)
- make the console application interactive (thus there is a penalty
only for the first run)
(the cache is loaded only once, as this profile shows):
Mon May 17 15:25:57 2010 /opt/cds-invenio/var/tmp/invenio-profile-
stats-20100517152447.raw
10719206 function calls (10505840 primitive calls) in 69.884 CPU
seconds
Ordered by: cumulative time
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.000 0.000 69.884 69.884
webinterface_handler.py:330(_handler)
2/1 0.000 0.000 69.883 69.883
webinterface_handler.py:171(_traverse)
1 0.000 0.000 69.878 69.878
websearch_webinterface.py:418(__call__)
1 0.000 0.000 69.874 69.874
search_engine.py:3986(perform_request_search)
1 0.000 0.000 69.762 69.762
search_engine.py:3214(print_records)
1 0.000 0.000 69.750 69.750
bibclassify_webinterface.py:65(main_page)
1 0.000 0.000 69.734 69.734
bibclassify_webinterface.py:189(generate_keywords)
1 0.000 0.000 69.720 69.720
bibclassify_engine.py:141(get_keywords_from_local_file)
1 0.000 0.000 69.637 69.637
bibclassify_engine.py:165(get_keywords_from_text)
1 0.000 0.000 56.950 56.950
bibclassify_ontology_reader.py:69(get_regular_expressions)
1 0.003 0.003 56.948 56.948
bibclassify_ontology_reader.py:572(_get_cache)
1 1.354 1.354 56.945 56.945 {cPickle.load}
17752 0.527 0.000 55.818 0.003 re.py:227(_compile)
17701 0.465 0.000 55.023 0.003 sre_compile.py:501(compile)
17701 0.325 0.000 31.684 0.002 sre_parse.py:669(parse)
28313/17701 0.743 0.000 30.786 0.002
sre_parse.py:307(_parse_sub)
34424/17701 8.570 0.000 30.433 0.002 sre_parse.py:385(_parse)
17701 0.197 0.000 22.618 0.001 sre_compile.py:486(_code)
83568/17701 5.385 0.000 17.814 0.001
sre_compile.py:38(_compile)
--
Ticket URL: <http://cdswaredev.cern.ch/invenio/ticket/49>
Invenio <http://cdswaredev.cern.ch/invenio>