Public bug reported:
To extract text from a PDF file written in Spanish with pdftotxt
function, accented characters (ü,á) are drawn incorrectly.
Example:
Original text => Extracted text
Facultad de Matemática y Computación => Facultad de Matem´tica y Computaci´n
Analizadores Multilingües en FreeLing => Analizadores Multiling¨es en FreeLing
** Affects: poppler (Ubuntu)
Importance: Undecided
Status: New
** Attachment added: "FreeLing 3.pdf"
https://bugs.launchpad.net/bugs/1527318/+attachment/4536318/+files/FreeLing%203.pdf
** Description changed:
To extract text from a PDF file written in Spanish with pdftotxt
function, accented characters (ü,á) are drawn incorrectly.
Example:
- Original text
Extracted text
- Facultad de Matemática y Computación Facultad de
Matem´tica y Computaci´n
- Analizadores Multilingües en FreeLing
Analizadores Multiling¨es en FreeLing
+ Original text => Extracted text
+ Facultad de Matemática y Computación => Facultad de Matem´tica y Computaci´n
+ Analizadores Multilingües en FreeLing => Analizadores Multiling¨es en FreeLing
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1527318
Title:
pdftotxt extraction of accented characters
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/poppler/+bug/1527318/+subscriptions
--
ubuntu-bugs mailing list
[email protected]
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs