Re: [PLUG] Adobe Japan in PDF document

Steve Dum Sun, 13 Oct 2019 12:01:31 -0700

It would be interesting to see a hex dump of one of the lines. (sayusing xxd) but of course, your looking at a line of textthat something extracted from the pdf, so it's probably already mangledbefore you can do a xxd.

It could be confusion between character sets, like latin1 vs UTF-8.Maybe there is some normally invisible 'start of line' or 'end of line'character that Adobe Japan adds, and emacs is interpreting it in thewrong character set.It might be as simple as changing the locale settings to en_US.utf8 oren_US.iso88591 before reading the doc.

If your reading it with emacs, there's probably a helper program (orlibrary) that's extracting the data from the pdf. So there's lots ofplaces for a linux program to do a mangled conversion. Typicallyassuming latin1 and mangling the UTF-8 char into gibberish, the reverseis also possible, but not usually the case. Most systems default toUTF-8 nowadays, but lots of programs aren't UTF-8 aware and try toreintrepret the data into latin1.


You might see if using adobe's Acrobat reader works any better.
steve



wes wrote:

On Sun, Oct 13, 2019 at 11:03 AM Rich Shepard <[email protected]>
wrote:

Why GUI? I view the text in emacs and joe. Both show the same thing and
there's no way to specify the caret-A and caret-a.

GUI is easier, or at least easier to learn. One can select the offending
character and copy it to the clipboard. Then, paste it into the
find/replace search field. Maybe there's an equivalent function in emacs. I
do this in vi all the time, though the mechanism is not nearly so
straightforward as in a GUI based text editor.

-wes
_______________________________________________
PLUG mailing list
[email protected]
http://lists.pdxlinux.org/mailman/listinfo/plug


_______________________________________________
PLUG mailing list
[email protected]
http://lists.pdxlinux.org/mailman/listinfo/plug

Re: [PLUG] Adobe Japan in PDF document

Reply via email to