On Wed, 11 Sep 2013, [email protected] wrote:
$ pip freeze | grep rdflib
rdflib==2.4.2
FYI, in my installation (based on master) 0204033.pdf with rdflib 2.4.2
gives this error:
(I just downgraded my original 3.2.3 version using pip just to check if
I get different results)
[root@vm]# sudo -u apache /opt/invenio/bin/bibclassify -k HEP
/tmp/0204033.pdf
Input file: 0204033.pdf
Traceback (most recent call last):
File "/opt/invenio/bin/bibclassify", line 62, in <module>
main()
File "/usr/lib64/python2.6/site-packages/invenio/bibclassify_cli.py",
line 117, in main
only_core_tags=options["only_core_tags"])
File
"/usr/lib64/python2.6/site-packages/invenio/bibclassify_engine.py", line
115, in output_keywords_for_sources
process_lines()
File
"/usr/lib64/python2.6/site-packages/invenio/bibclassify_engine.py", line
94, in process_lines
extract_acronyms=extract_acronyms
File
"/usr/lib64/python2.6/site-packages/invenio/bibclassify_engine.py", line
179, in get_keywords_from_text
rebuild=rebuild_cache, no_cache=no_cache))
File
"/usr/lib64/python2.6/site-packages/invenio/bibclassify_ontology_reader.py",
line 157, in get_regular_expressions
return _get_cache(cache_path, source_file=onto_path)
File
"/usr/lib64/python2.6/site-packages/invenio/bibclassify_ontology_reader.py",
line 727, in _get_cache
cached_data = cPickle.load(filestream)
ImportError: No module named term
While the same pdf with rdflib-3.2.3 gives better results:
[root@vm]# sudo -u apache /opt/invenio/bin/bibclassify -k HEP
/tmp/0204033.pdf
Input file: 0204033.pdf
ERROR bibclassify.ontology_reader:436 The composite term
"http://cern.ch/thesauri/HEPontology.rdf#Composite.decaymodeanomaly"
should be made of single keywords, but at least one is missing
ERROR bibclassify.ontology_reader:439 Missing is: decaymode
ERROR bibclassify.ontology_reader:451 We reset this composite
keyword, so that it does not match anything. Please fix the taxonomy.
ERROR bibclassify.ontology_reader:436 The composite term
"http://cern.ch/thesauri/HEPontology.rdf#Composite.operatorkinetics"
should be made of single keywords, but at least one is missing
ERROR bibclassify.ontology_reader:439 Missing is: kinetics
ERROR bibclassify.ontology_reader:451 We reset this composite
keyword, so that it does not match anything. Please fix the taxonomy.
Author keywords:
--
Composite keywords:
6 saturation: density [25, 30]
6 nucleus: stability [47, 7]
6 energy: symmetry [35, 11]
4 nucleus: mass [47, 16]
4 nucleon: density [13, 30]
3 energy: Coulomb [35, 3]
2 energy: density [35, 30]
2 nuclear matter: asymmetry [21, 2]
1 n: matter [49, 36]
1 n: density [49, 30]
1 n: mass [49, 16]
1 p: density [20, 30]
1 nucleus: binding energy [47, 2]
1 nucleus: ground state [47, 1]
1 nuclear matter: saturation [21, 25]
1 p: charge [20, 22]
1 energy: surface [35, 6]
1 resonance: energy [3, 35]
1 p: mass [20, 16]
1 form factor: charge [2, 22]
Single keywords:
49 K0
23 equation of state
12 slope
4 mass number
4 nuclide
3 nuclear model
3 mass formula
3 A1
2 charge distribution
2 elastic scattering
2 neutron star
2 correlation
2 monopole
2 helium
2 X-ray
1 numerical calculations
1 parametrization
1 surface tension
1 electrostatic
1 nuclear force
Core keywords:
49 K0
1 light nucleus
- heavy ion (1)
Field codes:
--
Acronyms:
--
--
bibclassify v0.4.9
I don't know if it helps at all, but it seems that maybe rdflib 3.2.3
can be used as well...
Best regards,
Theodoros