On Sun, 13 Oct 2019 08:21:02 -0700 (PDT)
Rich Shepard <[email protected]> dijo:

>I've a PDF document (in English) that is apparently encoded using an
>Adobe Japan 1 specification. Every word has an appended ^A (that's an
>uppercase A with a caret over it) and a space.
>
>Do you know of an emacs command to remove these characters? They're not
>unicode (which shows up as ^a\200\2xx).

I know nothing of emacs, but I would copy and paste the text from the
PDF document into a plain GUI text editor (Gedit is what I normally
use) and then use its search and replace function to replace them with
nothing. If I need fancier features I use LibreOffice Writer.

I have this problem occasionally with subtitles, and several of my
subtitle editors have the ability to convert the encoding to UT-8. Just
now I noticed that Gedit has the ability to set the language of a
document, although the only choices it offers me are various varieties
of English. I wonder what would happen if I opened the document in
Gedit. Ditto for opening it in Writer or Scribus, which also have the
ability to import a PDF document and keep the text as text.
_______________________________________________
PLUG mailing list
[email protected]
http://lists.pdxlinux.org/mailman/listinfo/plug

Reply via email to