#197: Errors in text extraction without running 'make install-pdfa-helper-files'
------------------------+---------------------------------------------------
  Reporter:  bthiell    |       Owner:  skaplun 
      Type:  defect     |      Status:  assigned
  Priority:  major      |   Milestone:  v1.1    
 Component:  WebSubmit  |     Version:          
Resolution:             |    Keywords:          
------------------------+---------------------------------------------------
Changes (by skaplun):

  * milestone:  v1.0 => v1.1


Comment:

 I think Benoit's patch might improve a bit the usability of the demo site,
 though I would rename: ''--extract-text-from-records'' to ''--extract-
 text-from-demo-records'', to make it clear that it is meant to be run on
 demo records (in particular WRT the explicit reference to OCRing of recid
 97. (although if '--extract-text-from-demo-records'' will not be
 explicitly called by the user, textification will be anyway handled by the
 first BibIndex call that does not skip fulltext search).

 On the other hand, a smarter algorithm for document conversion is
 definitively needed, e.g. one that would be able to handle extensions such
 as .gif.pdf;pdfa, and automatically fall back to the conversion for .pdf
 files.

 This actually might come as part of #14.

-- 
Ticket URL: <http://invenio-software.org/ticket/197#comment:7>
Invenio <http://invenio-software.org>

Reply via email to