On Sun, 13 Oct 2019 08:21:02 -0700 (PDT) Rich Shepard <[email protected]> dijo:
>I've a PDF document (in English) that is apparently encoded using an >Adobe Japan 1 specification. Every word has an appended ^A (that's an >uppercase A with a caret over it) and a space. > >Do you know of an emacs command to remove these characters? They're not >unicode (which shows up as ^a\200\2xx). I know nothing of emacs, but I would copy and paste the text from the PDF document into a plain GUI text editor (Gedit is what I normally use) and then use its search and replace function to replace them with nothing. If I need fancier features I use LibreOffice Writer. I have this problem occasionally with subtitles, and several of my subtitle editors have the ability to convert the encoding to UT-8. Just now I noticed that Gedit has the ability to set the language of a document, although the only choices it offers me are various varieties of English. I wonder what would happen if I opened the document in Gedit. Ditto for opening it in Writer or Scribus, which also have the ability to import a PDF document and keep the text as text. _______________________________________________ PLUG mailing list [email protected] http://lists.pdxlinux.org/mailman/listinfo/plug
