It would be interesting to see a hex dump of one of the lines. (say using xxd) but of course, your looking at a line of text that something extracted from the pdf, so it's probably already mangled before you can do a xxd.

It could be confusion between character sets, like latin1 vs UTF-8. Maybe there is some normally invisible 'start of line' or 'end of line' character that Adobe Japan adds, and emacs is interpreting it in the wrong character set. It might be as simple as changing the locale settings to en_US.utf8 or en_US.iso88591 before reading the doc.

If your reading it with emacs, there's probably a helper program (or library) that's extracting the data from the pdf. So there's lots of places for a linux program to do a mangled conversion. Typically assuming latin1 and mangling the UTF-8 char into gibberish, the reverse is also possible, but not usually the case.  Most systems default to UTF-8 nowadays, but lots of programs aren't UTF-8 aware and try to reintrepret the data into latin1.

You might see if using adobe's Acrobat reader works any better.
steve



wes wrote:
On Sun, Oct 13, 2019 at 11:03 AM Rich Shepard <[email protected]>
wrote:

Why GUI? I view the text in emacs and joe. Both show the same thing and
there's no way to specify the caret-A and caret-a.


GUI is easier, or at least easier to learn. One can select the offending
character and copy it to the clipboard. Then, paste it into the
find/replace search field. Maybe there's an equivalent function in emacs. I
do this in vi all the time, though the mechanism is not nearly so
straightforward as in a GUI based text editor.

-wes
_______________________________________________
PLUG mailing list
[email protected]
http://lists.pdxlinux.org/mailman/listinfo/plug


_______________________________________________
PLUG mailing list
[email protected]
http://lists.pdxlinux.org/mailman/listinfo/plug

Reply via email to