On Wed, 11 Sep 2013, [email protected] wrote:
    $ pip freeze | grep rdflib
    rdflib==2.4.2

FYI, in my installation (based on master) 0204033.pdf with rdflib 2.4.2 gives this error: (I just downgraded my original 3.2.3 version using pip just to check if I get different results)

[root@vm]# sudo -u apache /opt/invenio/bin/bibclassify -k HEP /tmp/0204033.pdf
Input file: 0204033.pdf
Traceback (most recent call last):
  File "/opt/invenio/bin/bibclassify", line 62, in <module>
    main()
File "/usr/lib64/python2.6/site-packages/invenio/bibclassify_cli.py", line 117, in main
    only_core_tags=options["only_core_tags"])
File "/usr/lib64/python2.6/site-packages/invenio/bibclassify_engine.py", line 115, in output_keywords_for_sources
    process_lines()
File "/usr/lib64/python2.6/site-packages/invenio/bibclassify_engine.py", line 94, in process_lines
    extract_acronyms=extract_acronyms
File "/usr/lib64/python2.6/site-packages/invenio/bibclassify_engine.py", line 179, in get_keywords_from_text
    rebuild=rebuild_cache, no_cache=no_cache))
File "/usr/lib64/python2.6/site-packages/invenio/bibclassify_ontology_reader.py", line 157, in get_regular_expressions
    return _get_cache(cache_path, source_file=onto_path)
File "/usr/lib64/python2.6/site-packages/invenio/bibclassify_ontology_reader.py", line 727, in _get_cache
    cached_data = cPickle.load(filestream)
ImportError: No module named term



While the same pdf with rdflib-3.2.3 gives better results:
[root@vm]# sudo -u apache /opt/invenio/bin/bibclassify -k HEP /tmp/0204033.pdf
Input file: 0204033.pdf
ERROR bibclassify.ontology_reader:436 The composite term "http://cern.ch/thesauri/HEPontology.rdf#Composite.decaymodeanomaly"; should be made of single keywords, but at least one is missing
ERROR bibclassify.ontology_reader:439    Missing is: decaymode
ERROR bibclassify.ontology_reader:451 We reset this composite keyword, so that it does not match anything. Please fix the taxonomy. ERROR bibclassify.ontology_reader:436 The composite term "http://cern.ch/thesauri/HEPontology.rdf#Composite.operatorkinetics"; should be made of single keywords, but at least one is missing
ERROR bibclassify.ontology_reader:439    Missing is: kinetics
ERROR bibclassify.ontology_reader:451 We reset this composite keyword, so that it does not match anything. Please fix the taxonomy.

Author keywords:
--

Composite keywords:
6  saturation: density [25, 30]
6  nucleus: stability [47, 7]
6  energy: symmetry [35, 11]
4  nucleus: mass [47, 16]
4  nucleon: density [13, 30]
3  energy: Coulomb [35, 3]
2  energy: density [35, 30]
2  nuclear matter: asymmetry [21, 2]
1  n: matter [49, 36]
1  n: density [49, 30]
1  n: mass [49, 16]
1  p: density [20, 30]
1  nucleus: binding energy [47, 2]
1  nucleus: ground state [47, 1]
1  nuclear matter: saturation [21, 25]
1  p: charge [20, 22]
1  energy: surface [35, 6]
1  resonance: energy [3, 35]
1  p: mass [20, 16]
1  form factor: charge [2, 22]

Single keywords:
49  K0
23  equation of state
12  slope
4  mass number
4  nuclide
3  nuclear model
3  mass formula
3  A1
2  charge distribution
2  elastic scattering
2  neutron star
2  correlation
2  monopole
2  helium
2  X-ray
1  numerical calculations
1  parametrization
1  surface tension
1  electrostatic
1  nuclear force

Core keywords:
49  K0
1  light nucleus
-  heavy ion (1)

Field codes:
--

Acronyms:
--

--
bibclassify v0.4.9


I don't know if it helps at all, but it seems that maybe rdflib 3.2.3 can be used as well...

Best regards,
Theodoros

Reply via email to