Hello,
Some time ago, I posted this bug report against Fedora 7 : https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=247393 and it seems that nothing happened... I am posting to this ML because I qualify this bug as important since it makes pdftotext completely useless with languages using accentuated characters (like french in my case...). Furthermore, pdftotext is currently used by Tracker ( http://www.gnome.org/projects/tracker/ ) to extract text from a PDF file. Then, Tracker can index contain of the text file which is assumed to be contain of the initial PDF file. Since pdftotext destroys accentuated characters, Tracker do not correctly index words and users cannot find them latter. In my bug report, you will find a PDF file with accentuated characters + a LaTeX file to reproduce another one. I also added what I obtained with pdftotext. I am not very interesting in installing Poppler 0.6.x (and to be honest I am afraid about what such an install could break on my computer: LaTeX ? Some PDF reader ? etc.) so I would like to know if this bug has been fixed or to point it to Poppler developers otherwise. Regards, Laurent Aguerreche.
signature.asc
Description: Ceci est une partie de message numériquement signée
_______________________________________________ poppler mailing list [email protected] http://lists.freedesktop.org/mailman/listinfo/poppler
