Re: [AUCTeX] wrong encoding for *output* buffer
Le 1 févr. 2018 à 13:27, David Kastrupa écrit : > jfbu writes: > >> Le 1 févr. 2018 à 12:58, Werner LEMBERG a écrit : >> >>> > My question was probably not precise enough: I wonder why auctex > doesn't set the buffer encoding also (derived from the master > file's local variables), given that auctex itself generates the > *xxx output* buffer. TeX is an 8-bit program wrapping its output, [...] >>> >>> I was specifically asking for XeTeX. >>> [...] including output containing quotes of the source as error locators, every 79bytes, irrespective of character boundaries? It also may encode some bytes in the middle of a character as ^^xx. >>> >>> IIRC, XeTeX is going to fix that (or already has) so that UTF-8 >>> characters won't be broken in the middle of the sequence. >> >> >> even pdflatex does with option -8bit >> >> file test.tex >> \documentclass{article} >> \begin{document} >> \typeout{éàù} >> \end{document} >> > > Lines are still getting wrapped after 79 bytes. > > -- > David Kastrup yes, besides my test file is just crap I wanted to do it but with 7bit ascii control characters rather once they are given suitable catcodes apologies Jean-François ___ auctex mailing list auctex@gnu.org https://lists.gnu.org/mailman/listinfo/auctex
Re: [AUCTeX] wrong encoding for *output* buffer
jfbuwrites: > Le 1 févr. 2018 à 12:58, Werner LEMBERG a écrit : > >> My question was probably not precise enough: I wonder why auctex doesn't set the buffer encoding also (derived from the master file's local variables), given that auctex itself generates the *xxx output* buffer. >>> >>> TeX is an 8-bit program wrapping its output, [...] >> >> I was specifically asking for XeTeX. >> >>> [...] including output containing quotes of the source as error >>> locators, every 79bytes, irrespective of character boundaries? It >>> also may encode some bytes in the middle of a character as ^^xx. >> >> IIRC, XeTeX is going to fix that (or already has) so that UTF-8 >> characters won't be broken in the middle of the sequence. > > > even pdflatex does with option -8bit > > file test.tex > \documentclass{article} > \begin{document} > \typeout{éàù} > \end{document} > Lines are still getting wrapped after 79 bytes. -- David Kastrup ___ auctex mailing list auctex@gnu.org https://lists.gnu.org/mailman/listinfo/auctex
Re: [AUCTeX] wrong encoding for *output* buffer
Le 1 févr. 2018 à 12:58, Werner LEMBERGa écrit : > >>> My question was probably not precise enough: I wonder why auctex >>> doesn't set the buffer encoding also (derived from the master >>> file's local variables), given that auctex itself generates the >>> *xxx output* buffer. >> >> TeX is an 8-bit program wrapping its output, [...] > > I was specifically asking for XeTeX. > >> [...] including output containing quotes of the source as error >> locators, every 79bytes, irrespective of character boundaries? It >> also may encode some bytes in the middle of a character as ^^xx. > > IIRC, XeTeX is going to fix that (or already has) so that UTF-8 > characters won't be broken in the middle of the sequence. even pdflatex does with option -8bit file test.tex \documentclass{article} \begin{document} \typeout{éàù} \end{document} pdflatex -8bit test.tex gives in log [...] LaTeX Font Info:... okay on input line 3. éàù (./test.aux) ) [...] unfortunately the model chosen by LaTeX for handling utf8 encoding leads to \IeC {\'e}\IeC {\`a}\IeC {\`u} output if one adds \usepackage[utf8]{inputenc} but this is a LaTeX2e design choice it is possible to design a scheme where utf8 bits characters would be correctly handed with TeX font and still give éàù in output in such situations Jean-François > > Anyway, it's a minor issue. > > >Werner > > ___ > auctex mailing list > auctex@gnu.org > https://lists.gnu.org/mailman/listinfo/auctex ___ auctex mailing list auctex@gnu.org https://lists.gnu.org/mailman/listinfo/auctex
Re: [AUCTeX] wrong encoding for *output* buffer
>> My question was probably not precise enough: I wonder why auctex >> doesn't set the buffer encoding also (derived from the master >> file's local variables), given that auctex itself generates the >> *xxx output* buffer. > > TeX is an 8-bit program wrapping its output, [...] I was specifically asking for XeTeX. > [...] including output containing quotes of the source as error > locators, every 79bytes, irrespective of character boundaries? It > also may encode some bytes in the middle of a character as ^^xx. IIRC, XeTeX is going to fix that (or already has) so that UTF-8 characters won't be broken in the middle of the sequence. Anyway, it's a minor issue. Werner ___ auctex mailing list auctex@gnu.org https://lists.gnu.org/mailman/listinfo/auctex
Re: [AUCTeX] wrong encoding for *output* buffer
>> My question was probably not precise enough: I wonder why auctex >> doesn't set the buffer encoding also (derived from the master >> file's local variables), given that auctex itself generates the >> *xxx output* buffer. > > Because the output buffer *xxx output* rarely needs to be saved in a > file. That buffer is only used for receiving outputs from TeX (and > some related programs such as makeindex), so AUCTeX just sets the > coding system for the process communication explicitly, leaving > `buffer-file-coding-system' of *xxx output* untouched. OK, thanks. Werner ___ auctex mailing list auctex@gnu.org https://lists.gnu.org/mailman/listinfo/auctex
Re: [AUCTeX] wrong encoding for *output* buffer
Hi Werner, > Werner LEMBERGwrites: > My question was probably not precise enough: I wonder why auctex > doesn't set the buffer encoding also (derived from the master file's > local variables), given that auctex itself generates the *xxx output* > buffer. Because the output buffer *xxx output* rarely needs to be saved in a file. That buffer is only used for receiving outputs from TeX (and some related programs such as makeindex), so AUCTeX just sets the coding system for the process communication explicitly, leaving `buffer-file-coding-system' of *xxx output* untouched. Best, Ikumi Keita ___ auctex mailing list auctex@gnu.org https://lists.gnu.org/mailman/listinfo/auctex
Re: [AUCTeX] wrong encoding for *output* buffer
Werner LEMBERGwrites: >>> The only thing which looks strange to me is that, the mode line in >>> the `*xxx output* buffer starts with >>> >>> 1:** >>> (but the UTF-8 contents of the log file as emitted by XeTeX is >>> correctly displayed). >> >> In that case, latin-1 is the coding system for saving that buffer >> and utf-8 is for decoding the output from external process. Of >> several coding systems, the one for saving the buffer is the most >> important for most cases, so usually only that one is displayed in >> the mode line. > > Thanks for the explanation, which I already knew :-) > > My question was probably not precise enough: I wonder why auctex > doesn't set the buffer encoding also (derived from the master file's > local variables), given that auctex itself generates the *xxx output* > buffer. TeX is an 8-bit program wrapping its output, including output containing quotes of the source as error locators, every 79bytes, irrespective of character boundaries? It also may encode some bytes in the middle of a character as ^^xx . Interpreting its output thus relies on the output actually being interpretable in the given encoding. We might have a bit more leeway here with XEmacs out of the race (XEmacs' utf-8 encoding and reencoding was not round-trippable). -- David Kastrup ___ auctex mailing list auctex@gnu.org https://lists.gnu.org/mailman/listinfo/auctex
Re: [AUCTeX] wrong encoding for *output* buffer
>> The only thing which looks strange to me is that, the mode line in >> the `*xxx output* buffer starts with >> >> 1:** >> (but the UTF-8 contents of the log file as emitted by XeTeX is >> correctly displayed). > > In that case, latin-1 is the coding system for saving that buffer > and utf-8 is for decoding the output from external process. Of > several coding systems, the one for saving the buffer is the most > important for most cases, so usually only that one is displayed in > the mode line. Thanks for the explanation, which I already knew :-) My question was probably not precise enough: I wonder why auctex doesn't set the buffer encoding also (derived from the master file's local variables), given that auctex itself generates the *xxx output* buffer. Werner ___ auctex mailing list auctex@gnu.org https://lists.gnu.org/mailman/listinfo/auctex
Re: [AUCTeX] wrong encoding for *output* buffer
Hi Werner, > Werner LEMBERGwrites: > The only thing which looks strange to me is that, the mode line in the > `*xxx output* buffer starts with > 1:** > (but the UTF-8 contents of the log file as emitted by XeTeX is > correctly displayed). In that case, latin-1 is the coding system for saving that buffer and utf-8 is for decoding the output from external process. Of several coding systems, the one for saving the buffer is the most important for most cases, so usually only that one is displayed in the mode line. Emacs assigns several coding systems separately according to their purposes: saving the buffer, decoding the output from process, encoding the input to process, decoding the keyboard input from text terminal, encoding the screen output to text terminal... You can see three of them in the form like EEE:** when you do "emacs -nw" on text terminal. See the doc string of the variable `mode-line-mule-info' for detail. More detailed information about coding systems associated with the buffer can be displayed via C-h C or M-x describe-coding-system. Best, Ikumi Keita ___ auctex mailing list auctex@gnu.org https://lists.gnu.org/mailman/listinfo/auctex
Re: [AUCTeX] wrong encoding for *output* buffer
> AUCTeX determines the coding system for reading from the output of > asynchronous TeX process by the function > `TeX-adjust-process-coding-system'. It basically obeys the coding > system of the command buffer, i.e., the buffer in which `C-c C-c' or > something like it is issued. So I expect that it usually works > well. Thanks for the explanation. Right now, I can't repeat the issue. The only thing which looks strange to me is that, the mode line in the `*xxx output* buffer starts with 1:** (but the UTF-8 contents of the log file as emitted by XeTeX is correctly displayed). > [...] putting the > > %%% coding: utf-8 > > cookie in your ASCII sub file would do the trick. This makes the > local value of `buffer-file-coding-system' to be the specified > value, so the utf-8 output from xelatex would be decoded correctly. I will try that as soon as I encounter the problem again. > If this is not the case, I'm grateful if you provide sample xelatex > documents to examine. Will do! Werner ___ auctex mailing list auctex@gnu.org https://lists.gnu.org/mailman/listinfo/auctex
Re: [AUCTeX] wrong encoding for *output* buffer
Hi Werner, > Werner LEMBERGwrites: > [git commit 4b66b9f60e3ce4a552bd4f3230b659347add1446] > Folks, > I have the following in my master document. > %%% Local Variables: > %%% coding: utf-8 > %%% mode: latex > %%% TeX-engine: xetex > %%% TeX-PDF-mode: t > %%% TeX-master: t > %%% End: > However, the `*xxx output* buffer (showing the compilation results of > xetex) is in latin-1 encoding – I guess this is due to > (set-language-environment "latin-1") > (setq-default buffer-file-coding-system 'latin-1) > in my `~/.emacs' file... > What must I do so that auctex obeys the local encoding variables in my > master document, thus overriding `~/.emacs'? [AFAICS, xetex *can* be > forced to use non-UTF8 legacy encodings, so relying on `TeX-engine' is > probably not sufficient.] AUCTeX determines the coding system for reading from the output of asynchronous TeX process by the function `TeX-adjust-process-coding-system'. It basically obeys the coding system of the command buffer, i.e., the buffer in which `C-c C-c' or something like it is issued. So I expect that it usually works well. Does the symptom you described occur for all xelatex documents or for some particular documents? If latter, one possible guess is that (1) The document is devided in multiple files and (2) The sub file of the master file has no multibyte characters, i.e., contains only ASCII characters and (3) The command buffer is the one for that ASCII sub file. In that case, the local value of `buffer-file-coding-system' of the sub file buffer can be a kind of `undecided-*', and emacs eventually uses the value of `default-process-coding-system' for reading from the output of xelatex. With your settings, `default-process-coding-system' is (iso-latin-1-* . iso-latin-1-*), so the utf-8 characters in the output are not decoded correctly. If this guess is right, putting the %%% coding: utf-8 cookie in your ASCII sub file would do the trick. This makes the local value of `buffer-file-coding-system' to be the specified value, so the utf-8 output from xelatex would be decoded correctly. If this is not the case, I'm grateful if you provide sample xelatex documents to examine. Regards, Ikumi Keita ___ auctex mailing list auctex@gnu.org https://lists.gnu.org/mailman/listinfo/auctex