Hello,

I am using pdftotext with the pdf file which has rare old 8bit encoding.
By default pdftotext uses -enc UTF-8 flag, and 8bit encoding becomes multibyte in the output text file.

I need to preserve that encoding, and will be able to handle/convert it if necessary later, is it possible somehow to tell pdftotext utility to copy symbols as is, in this 8bit encoding?

I have tried using different -enc options, the best results are with Latin1, but then not all the letters are copied to the resulting text file.

I need to tell pdftotext to not convert, to just ignore the encoding. Or at least transfer characters in range from 127..255 as is, without conversion.

Is it possible?

Thank you.
_______________________________________________
poppler mailing list
[email protected]
https://lists.freedesktop.org/mailman/listinfo/poppler

Reply via email to