It would be interesting to see a hex dump of one of the lines. (say
using xxd) but of course, your looking at a line of text
that something extracted from the pdf, so it's probably already mangled
before you can do a xxd.
It could be confusion between character sets, like latin1 vs UTF-8.
Maybe there is some normally invisible 'start of line' or 'end of line'
character that Adobe Japan adds, and emacs is interpreting it in the
wrong character set.
It might be as simple as changing the locale settings to en_US.utf8 or
en_US.iso88591 before reading the doc.
If your reading it with emacs, there's probably a helper program (or
library) that's extracting the data from the pdf. So there's lots of
places for a linux program to do a mangled conversion. Typically
assuming latin1 and mangling the UTF-8 char into gibberish, the reverse
is also possible, but not usually the case. Most systems default to
UTF-8 nowadays, but lots of programs aren't UTF-8 aware and try to
reintrepret the data into latin1.
You might see if using adobe's Acrobat reader works any better.
steve
wes wrote:
On Sun, Oct 13, 2019 at 11:03 AM Rich Shepard <[email protected]>
wrote:
Why GUI? I view the text in emacs and joe. Both show the same thing and
there's no way to specify the caret-A and caret-a.
GUI is easier, or at least easier to learn. One can select the offending
character and copy it to the clipboard. Then, paste it into the
find/replace search field. Maybe there's an equivalent function in emacs. I
do this in vi all the time, though the mechanism is not nearly so
straightforward as in a GUI based text editor.
-wes
_______________________________________________
PLUG mailing list
[email protected]
http://lists.pdxlinux.org/mailman/listinfo/plug
_______________________________________________
PLUG mailing list
[email protected]
http://lists.pdxlinux.org/mailman/listinfo/plug