Re: [AUCTeX] wrong encoding for *output* buffer

2018-02-01 Thread jfbu

Le 1 févr. 2018 à 13:27, David Kastrup  a écrit :

> jfbu  writes:
> 
>> Le 1 févr. 2018 à 12:58, Werner LEMBERG  a écrit :
>> 
>>> 
> My question was probably not precise enough: I wonder why auctex
> doesn't set the buffer encoding also (derived from the master
> file's local variables), given that auctex itself generates the
> *xxx output* buffer.
 
 TeX is an 8-bit program wrapping its output, [...]
>>> 
>>> I was specifically asking for XeTeX.
>>> 
 [...] including output containing quotes of the source as error
 locators, every 79bytes, irrespective of character boundaries?  It
 also may encode some bytes in the middle of a character as ^^xx.
>>> 
>>> IIRC, XeTeX is going to fix that (or already has) so that UTF-8
>>> characters won't be broken in the middle of the sequence.
>> 
>> 
>> even pdflatex does with option -8bit
>> 
>>  file test.tex
>> \documentclass{article}
>> \begin{document}
>> \typeout{éàù}
>> \end{document}
>> 
> 
> Lines are still getting wrapped after 79 bytes.
> 
> -- 
> David Kastrup

yes, besides my test file is just crap

I wanted to do it but with 7bit ascii control characters rather
once they are given suitable catcodes

apologies

Jean-François


___
auctex mailing list
auctex@gnu.org
https://lists.gnu.org/mailman/listinfo/auctex


Re: [AUCTeX] wrong encoding for *output* buffer

2018-02-01 Thread David Kastrup
jfbu  writes:

> Le 1 févr. 2018 à 12:58, Werner LEMBERG  a écrit :
>
>> 
 My question was probably not precise enough: I wonder why auctex
 doesn't set the buffer encoding also (derived from the master
 file's local variables), given that auctex itself generates the
 *xxx output* buffer.
>>> 
>>> TeX is an 8-bit program wrapping its output, [...]
>> 
>> I was specifically asking for XeTeX.
>> 
>>> [...] including output containing quotes of the source as error
>>> locators, every 79bytes, irrespective of character boundaries?  It
>>> also may encode some bytes in the middle of a character as ^^xx.
>> 
>> IIRC, XeTeX is going to fix that (or already has) so that UTF-8
>> characters won't be broken in the middle of the sequence.
>
>
> even pdflatex does with option -8bit
>
>  file test.tex
> \documentclass{article}
> \begin{document}
> \typeout{éàù}
> \end{document}
> 

Lines are still getting wrapped after 79 bytes.

-- 
David Kastrup

___
auctex mailing list
auctex@gnu.org
https://lists.gnu.org/mailman/listinfo/auctex


Re: [AUCTeX] wrong encoding for *output* buffer

2018-02-01 Thread jfbu

Le 1 févr. 2018 à 12:58, Werner LEMBERG  a écrit :

> 
>>> My question was probably not precise enough: I wonder why auctex
>>> doesn't set the buffer encoding also (derived from the master
>>> file's local variables), given that auctex itself generates the
>>> *xxx output* buffer.
>> 
>> TeX is an 8-bit program wrapping its output, [...]
> 
> I was specifically asking for XeTeX.
> 
>> [...] including output containing quotes of the source as error
>> locators, every 79bytes, irrespective of character boundaries?  It
>> also may encode some bytes in the middle of a character as ^^xx.
> 
> IIRC, XeTeX is going to fix that (or already has) so that UTF-8
> characters won't be broken in the middle of the sequence.


even pdflatex does with option -8bit

 file test.tex
\documentclass{article}
\begin{document}
\typeout{éàù}
\end{document}


pdflatex -8bit test.tex

gives in log


[...]
LaTeX Font Info:... okay on input line 3.
éàù
(./test.aux) ) 
[...]


unfortunately the model chosen by LaTeX
for handling utf8 encoding leads to

\IeC {\'e}\IeC {\`a}\IeC {\`u}

output if one adds

\usepackage[utf8]{inputenc}

but this is a LaTeX2e design choice

it is possible to design a scheme where utf8 bits
characters would be correctly handed with TeX
font and still give éàù in output in such situations

Jean-François


> 
> Anyway, it's a minor issue.
> 
> 
>Werner
> 
> ___
> auctex mailing list
> auctex@gnu.org
> https://lists.gnu.org/mailman/listinfo/auctex


___
auctex mailing list
auctex@gnu.org
https://lists.gnu.org/mailman/listinfo/auctex


Re: [AUCTeX] wrong encoding for *output* buffer

2018-02-01 Thread Werner LEMBERG

>> My question was probably not precise enough: I wonder why auctex
>> doesn't set the buffer encoding also (derived from the master
>> file's local variables), given that auctex itself generates the
>> *xxx output* buffer.
> 
> TeX is an 8-bit program wrapping its output, [...]

I was specifically asking for XeTeX.

> [...] including output containing quotes of the source as error
> locators, every 79bytes, irrespective of character boundaries?  It
> also may encode some bytes in the middle of a character as ^^xx.

IIRC, XeTeX is going to fix that (or already has) so that UTF-8
characters won't be broken in the middle of the sequence.

Anyway, it's a minor issue.


Werner

___
auctex mailing list
auctex@gnu.org
https://lists.gnu.org/mailman/listinfo/auctex


Re: [AUCTeX] wrong encoding for *output* buffer

2018-02-01 Thread Werner LEMBERG

>> My question was probably not precise enough: I wonder why auctex
>> doesn't set the buffer encoding also (derived from the master
>> file's local variables), given that auctex itself generates the
>> *xxx output* buffer.
> 
> Because the output buffer *xxx output* rarely needs to be saved in a
> file.  That buffer is only used for receiving outputs from TeX (and
> some related programs such as makeindex), so AUCTeX just sets the
> coding system for the process communication explicitly, leaving
> `buffer-file-coding-system' of *xxx output* untouched.

OK, thanks.


Werner

___
auctex mailing list
auctex@gnu.org
https://lists.gnu.org/mailman/listinfo/auctex


Re: [AUCTeX] wrong encoding for *output* buffer

2018-02-01 Thread Ikumi Keita
Hi Werner,

> Werner LEMBERG  writes:
> My question was probably not precise enough: I wonder why auctex
> doesn't set the buffer encoding also (derived from the master file's
> local variables), given that auctex itself generates the *xxx output*
> buffer.

Because the output buffer *xxx output* rarely needs to be saved in a
file.  That buffer is only used for receiving outputs from TeX (and some
related programs such as makeindex), so AUCTeX just sets the coding
system for the process communication explicitly, leaving
`buffer-file-coding-system' of *xxx output* untouched.

Best,
Ikumi Keita

___
auctex mailing list
auctex@gnu.org
https://lists.gnu.org/mailman/listinfo/auctex


Re: [AUCTeX] wrong encoding for *output* buffer

2018-02-01 Thread David Kastrup
Werner LEMBERG  writes:

>>> The only thing which looks strange to me is that, the mode line in
>>> the `*xxx output* buffer starts with
>>>
>>>  1:**
>>> (but the UTF-8 contents of the log file as emitted by XeTeX is
>>> correctly displayed).
>> 
>> In that case, latin-1 is the coding system for saving that buffer
>> and utf-8 is for decoding the output from external process.  Of
>> several coding systems, the one for saving the buffer is the most
>> important for most cases, so usually only that one is displayed in
>> the mode line.
>
> Thanks for the explanation, which I already knew :-)
>
> My question was probably not precise enough: I wonder why auctex
> doesn't set the buffer encoding also (derived from the master file's
> local variables), given that auctex itself generates the *xxx output*
> buffer.

TeX is an 8-bit program wrapping its output, including output containing
quotes of the source as error locators, every 79bytes, irrespective of
character boundaries?  It also may encode some bytes in the middle of a
character as ^^xx .  Interpreting its output thus relies on the output
actually being interpretable in the given encoding.  We might have a bit
more leeway here with XEmacs out of the race (XEmacs' utf-8 encoding and
reencoding was not round-trippable).

-- 
David Kastrup

___
auctex mailing list
auctex@gnu.org
https://lists.gnu.org/mailman/listinfo/auctex


Re: [AUCTeX] wrong encoding for *output* buffer

2018-01-31 Thread Werner LEMBERG
>> The only thing which looks strange to me is that, the mode line in
>> the `*xxx output* buffer starts with
>>
>>  1:**
>> (but the UTF-8 contents of the log file as emitted by XeTeX is
>> correctly displayed).
> 
> In that case, latin-1 is the coding system for saving that buffer
> and utf-8 is for decoding the output from external process.  Of
> several coding systems, the one for saving the buffer is the most
> important for most cases, so usually only that one is displayed in
> the mode line.

Thanks for the explanation, which I already knew :-)

My question was probably not precise enough: I wonder why auctex
doesn't set the buffer encoding also (derived from the master file's
local variables), given that auctex itself generates the *xxx output*
buffer.


Werner

___
auctex mailing list
auctex@gnu.org
https://lists.gnu.org/mailman/listinfo/auctex


Re: [AUCTeX] wrong encoding for *output* buffer

2018-01-31 Thread Ikumi Keita
Hi Werner,

> Werner LEMBERG  writes:
> The only thing which looks strange to me is that, the mode line in the
> `*xxx output* buffer starts with

>  1:**

> (but the UTF-8 contents of the log file as emitted by XeTeX is
> correctly displayed).

In that case, latin-1 is the coding system for saving that buffer and
utf-8 is for decoding the output from external process.  Of several
coding systems, the one for saving the buffer is the most important for
most cases, so usually only that one is displayed in the mode line.

Emacs assigns several coding systems separately according to their
purposes: saving the buffer, decoding the output from process, encoding
the input to process, decoding the keyboard input from text terminal,
encoding the screen output to text terminal...  You can see three of
them in the form like

EEE:**

when you do "emacs -nw" on text terminal.  See the doc string of the
variable `mode-line-mule-info' for detail.

More detailed information about coding systems associated with the
buffer can be displayed via C-h C or M-x describe-coding-system.

Best,
Ikumi Keita

___
auctex mailing list
auctex@gnu.org
https://lists.gnu.org/mailman/listinfo/auctex


Re: [AUCTeX] wrong encoding for *output* buffer

2018-01-31 Thread Werner LEMBERG

> AUCTeX determines the coding system for reading from the output of
> asynchronous TeX process by the function
> `TeX-adjust-process-coding-system'.  It basically obeys the coding
> system of the command buffer, i.e., the buffer in which `C-c C-c' or
> something like it is issued.  So I expect that it usually works
> well.

Thanks for the explanation.  Right now, I can't repeat the issue.

The only thing which looks strange to me is that, the mode line in the
`*xxx output* buffer starts with

 1:**

(but the UTF-8 contents of the log file as emitted by XeTeX is
correctly displayed).

> [...]  putting the
>
> %%% coding: utf-8
>
> cookie in your ASCII sub file would do the trick.  This makes the
> local value of `buffer-file-coding-system' to be the specified
> value, so the utf-8 output from xelatex would be decoded correctly.

I will try that as soon as I encounter the problem again.

> If this is not the case, I'm grateful if you provide sample xelatex
> documents to examine.

Will do!


Werner

___
auctex mailing list
auctex@gnu.org
https://lists.gnu.org/mailman/listinfo/auctex


Re: [AUCTeX] wrong encoding for *output* buffer

2018-01-26 Thread Ikumi Keita
Hi Werner,

> Werner LEMBERG  writes:
> [git commit 4b66b9f60e3ce4a552bd4f3230b659347add1446]

> Folks,

> I have the following in my master document.

>   %%% Local Variables:
>   %%% coding: utf-8
>   %%% mode: latex
>   %%% TeX-engine: xetex
>   %%% TeX-PDF-mode: t
>   %%% TeX-master: t
>   %%% End:

> However, the `*xxx output* buffer (showing the compilation results of
> xetex) is in latin-1 encoding – I guess this is due to

>   (set-language-environment "latin-1")
>   (setq-default buffer-file-coding-system 'latin-1)

> in my `~/.emacs' file...

> What must I do so that auctex obeys the local encoding variables in my
> master document, thus overriding `~/.emacs'?  [AFAICS, xetex *can* be
> forced to use non-UTF8 legacy encodings, so relying on `TeX-engine' is
> probably not sufficient.]

AUCTeX determines the coding system for reading from the output of
asynchronous TeX process by the function
`TeX-adjust-process-coding-system'.  It basically obeys the coding
system of the command buffer, i.e., the buffer in which `C-c C-c' or
something like it is issued.  So I expect that it usually works well.

Does the symptom you described occur for all xelatex documents or for
some particular documents?  If latter, one possible guess is that
(1) The document is devided in multiple files and
(2) The sub file of the master file has no multibyte characters, i.e.,
contains only ASCII characters and
(3) The command buffer is the one for that ASCII sub file.
In that case, the local value of `buffer-file-coding-system' of the sub
file buffer can be a kind of `undecided-*', and emacs eventually uses
the value of `default-process-coding-system' for reading from the output
of xelatex.  With your settings, `default-process-coding-system' is
(iso-latin-1-* . iso-latin-1-*), so the utf-8 characters in the output
are not decoded correctly.

If this guess is right, putting the
%%% coding: utf-8
cookie in your ASCII sub file would do the trick.  This makes the local
value of `buffer-file-coding-system' to be the specified value, so the
utf-8 output from xelatex would be decoded correctly.

If this is not the case, I'm grateful if you provide sample xelatex
documents to examine.

Regards,
Ikumi Keita

___
auctex mailing list
auctex@gnu.org
https://lists.gnu.org/mailman/listinfo/auctex