Hello,
I am using pdftotext with the pdf file which has rare old 8bit encoding.
By default pdftotext uses -enc UTF-8 flag, and 8bit encoding becomes
multibyte in the output text file.
I need to preserve that encoding, and will be able to handle/convert it
if necessary later, is it possible somehow to tell pdftotext utility to
copy symbols as is, in this 8bit encoding?
I have tried using different -enc options, the best results are with
Latin1, but then not all the letters are copied to the resulting text file.
I need to tell pdftotext to not convert, to just ignore the encoding. Or
at least transfer characters in range from 127..255 as is, without
conversion.
Is it possible?
Thank you.
_______________________________________________
poppler mailing list
[email protected]
https://lists.freedesktop.org/mailman/listinfo/poppler