#49: Bibclassify slow - cache loading
-------------------------+--------------------------------------------------
 Reporter:  rchyla       |       Owner:  rchyla
     Type:  defect       |      Status:  new   
 Priority:  major        |   Milestone:        
Component:  BibClassify  |     Version:        
 Keywords:               |  
-------------------------+--------------------------------------------------
 Bibclassify, when loading the cache, is very slow

 Cache contains regexes (and many of them), Python recompiles the regexes
 during load: http://stackoverflow.com/questions/65266/caching-compiled-
 regex-objects-in-python

 Since we have much bigger taxonomy now, I need to investigate ways to:
 - make it smaller (perhaps share regexes among Keyword object)
 - import only once (and share among threads)
 - make the console application interactive (thus there is a penalty
 only for the first run)

 (the cache is loaded only once, as this profile shows):

 Mon May 17 15:25:57 2010    /opt/cds-invenio/var/tmp/invenio-profile-
 stats-20100517152447.raw

         10719206 function calls (10505840 primitive calls) in 69.884 CPU
 seconds

   Ordered by: cumulative time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000   69.884   69.884
 webinterface_handler.py:330(_handler)
      2/1    0.000    0.000   69.883   69.883
 webinterface_handler.py:171(_traverse)
        1    0.000    0.000   69.878   69.878
 websearch_webinterface.py:418(__call__)
        1    0.000    0.000   69.874   69.874
 search_engine.py:3986(perform_request_search)
        1    0.000    0.000   69.762   69.762
 search_engine.py:3214(print_records)
        1    0.000    0.000   69.750   69.750
 bibclassify_webinterface.py:65(main_page)
        1    0.000    0.000   69.734   69.734
 bibclassify_webinterface.py:189(generate_keywords)
        1    0.000    0.000   69.720   69.720
 bibclassify_engine.py:141(get_keywords_from_local_file)
        1    0.000    0.000   69.637   69.637
 bibclassify_engine.py:165(get_keywords_from_text)
        1    0.000    0.000   56.950   56.950
 bibclassify_ontology_reader.py:69(get_regular_expressions)
        1    0.003    0.003   56.948   56.948
 bibclassify_ontology_reader.py:572(_get_cache)
        1    1.354    1.354   56.945   56.945 {cPickle.load}
    17752    0.527    0.000   55.818    0.003 re.py:227(_compile)
    17701    0.465    0.000   55.023    0.003 sre_compile.py:501(compile)
    17701    0.325    0.000   31.684    0.002 sre_parse.py:669(parse)
 28313/17701    0.743    0.000   30.786    0.002
 sre_parse.py:307(_parse_sub)
 34424/17701    8.570    0.000   30.433    0.002 sre_parse.py:385(_parse)
    17701    0.197    0.000   22.618    0.001 sre_compile.py:486(_code)
 83568/17701    5.385    0.000   17.814    0.001
 sre_compile.py:38(_compile)

-- 
Ticket URL: <http://cdswaredev.cern.ch/invenio/ticket/49>
Invenio <http://cdswaredev.cern.ch/invenio>

Reply via email to