#197: Errors in text extraction without running 'make install-pdfa-helper-files'
------------------------+---------------------------------------------------
Reporter: bthiell | Owner: skaplun
Type: defect | Status: assigned
Priority: major | Milestone: v1.1
Component: WebSubmit | Version:
Resolution: | Keywords:
------------------------+---------------------------------------------------
Changes (by skaplun):
* milestone: v1.0 => v1.1
Comment:
I think Benoit's patch might improve a bit the usability of the demo site,
though I would rename: ''--extract-text-from-records'' to ''--extract-
text-from-demo-records'', to make it clear that it is meant to be run on
demo records (in particular WRT the explicit reference to OCRing of recid
97. (although if '--extract-text-from-demo-records'' will not be
explicitly called by the user, textification will be anyway handled by the
first BibIndex call that does not skip fulltext search).
On the other hand, a smarter algorithm for document conversion is
definitively needed, e.g. one that would be able to handle extensions such
as .gif.pdf;pdfa, and automatically fall back to the conversion for .pdf
files.
This actually might come as part of #14.
--
Ticket URL: <http://invenio-software.org/ticket/197#comment:7>
Invenio <http://invenio-software.org>