#1013: Check for gibberish in references before accepting them
-------------------------------------------+-------------------------
Reporter:  adeiana                         |       Owner:  adeiana
    Type:  enhancement                     |      Status:  new
Priority:  minor                           |   Component:  DocExtract
 Version:                                  |  Resolution:
Keywords:  garbage pdftotext pdf2text OCR  |
-------------------------------------------+-------------------------
Changes (by skaplun):

 * keywords:   => garbage pdftotext pdf2text OCR


Comment:

 Hi Alessio,

 this is a nice feature that would be nice if it was factored-out and
 available upon the general textification process. Indeed we don't have yet
 an heuristic on what is garbage coming out from pdftotext.

 If you implement such an heuristic it would be nice it was made in a
 generic way, and put e.g. in textutils or in bibdocfile, so that also
 BibIndex avoid indexing garbage.

 Cheers!
     Sam

-- 
Ticket URL: <http://invenio-software.org/ticket/1013#comment:1>
Invenio <http://invenio-software.org>

Reply via email to