Re: [XeTeX] latin-1 encoded characters in commented out parts trigger log warnings

Ross Moore Sun, 21 Feb 2021 14:55:42 -0800

Hi David,

On 22 Feb 2021, at 8:43 am, David Carlisle 
<d.p.carli...@gmail.com<mailto:d.p.carli...@gmail.com>> wrote:

Surely the line-end characters are already known, and the bits&bytes
have been read up to that point *before* tokenisation.

This is not a pdflatex inputenc style utf-8 error failing to map a stream of
tokens.

It is at the file reading stage and if you have the file encoding wrong you do
not know reliably what are the ends of lines and you haven't interpreted it as
tex at all, so the comment character really can't have an effect here.

Ummm. Is that really how XeTeX does it?
How then does Jonathan’s
\XeTeXdefaultencoding "iso-8859-1”
ever work ?
Just a rhetorical question; don’t bother answering. :-)

This mapping is invisible to the tex macro layer just as you can change the
internal character code mapping in classic tex to take an ebcdic stream, if you
do that then read an ascii file you get rubbish with no hope to recover.

So I don't think such a switch should be automatic to avoid reporting encoding
errors.

I reported the issue at xstring here
https://framagit.org/unbonpetit/xstring/-/issues/4<https://framagit.org/unbonpetit/xstring/-/issues/4>

I looked at what you said here, and some of it doesn’t seem to be in accord with
my TeXLive installations.

viz.

/usr/local/texlive/2016/.../xstring.tex:\expandafter\ifx\csname
@latexerr\endcsname\relax% on n'utilise pas LaTeX ?
/usr/local/texlive/2016/.../xstring.tex:\fi% fin des d\'efinitions LaTeX
/usr/local/texlive/2016/.../xstring.tex:% - Le package ne n\'ecessite plus
LaTeX et est d\'esormais utilisable sous
/usr/local/texlive/2016/.../xstring.tex:% Plain eTeX.
/usr/local/texlive/2017/.../xstring.tex:% conditions of the LaTeX Project
Public License, either version 1.3
/usr/local/texlive/2017/.../xstring.tex:% and version 1.3 or later is part of
all distributions of LaTeX
/usr/local/texlive/2017/.../xstring.tex:\expandafter\ifx\csname
@latexerr\endcsname\relax% on n'utilise pas LaTeX ?
/usr/local/texlive/2017/.../xstring.tex:\fi% fin des d\'efinitions LaTeX
/usr/local/texlive/2017/.../xstring.tex:% - Le package ne n\'ecessite plus
LaTeX et est d\'esormais utilisable sous
/usr/local/texlive/2017/.../xstring.tex:% Plain eTeX.
/usr/local/texlive/2018/.../xstring.tex:% !TeX encoding = ISO-8859-1
/usr/local/texlive/2018/.../xstring.tex:% Licence : Released under the LaTeX
Project Public License v1.3c %
/usr/local/texlive/2018/.../xstring.tex:% Plain eTeX.
/usr/local/texlive/2019/.../xstring.tex:% !TeX encoding = ISO-8859-1
/usr/local/texlive/2019/.../xstring.tex:% Licence : Released under the LaTeX
Project Public License v1.3c %
/usr/local/texlive/2019/.../xstring.tex: Plain eTeX.

prior to 2018, the accents in comments used ASCII, so UTF-8, but not
intentionally so.

In 2017, the accents in comments became latin-1 chars.
A 1st line was added: % !TeX encoding = ISO-8859-1
to indicate this.

Such directive comments are useless, except at the beginning of the main
document source.
They are for Front-End software, not TeX processing, right?

Jonathan, David,
so far as I can tell, it was *never* in UTF-8 with preformed accents.

David

that says what follows next is to be interpreted in a different way to what
came previously?
Until the next switch that returns to UTF-8 or whatever?

If XeTeX is based on eTeX, then this should be possible in that setting.

Even replacing by U+FFFD
is being lenient.

Why has the mouth not realised that this information is to be discarded?
Then no replacement is required at all.

The file reading has failed before any tex accessible processing has happened
(see the ebcdic example in the TeXBook)

OK.
But that’s changing the meaning of bit-order, yes?
Surely we can be past that.

\danger \TeX\ always uses the internal character code of Appendix~C
for the standard ASCII characters,
regardless of what external coding scheme actually appears in the files
being read. Thus, |b| is 98 inside of \TeX\ even when your computer
normally deals with ^{EBCDIC} or some other non-ASCII scheme; the \TeX\
software has been set up to convert text files to internal code, and to
convert back to the external code when writing text files.

the file encoding is failing at the "convert text files to internal code"
stage which is before the line buffer of characters is consulted to produce the
stream of tokens based on catcodes.

Yes, OK; so my model isn’t up to it, as Bruno said.
… And Jonathan has commented.

Also pdfTeX has no trouble with an xstring example.
It just seems pretty crazy that the comments need to be altered
for that package to be used with XeTeX.

David

Cheers, and thanks for this discussion.

Ross

Dr Ross Moore
Department of Mathematics and Statistics
12 Wally’s Walk, Level 7, Room 734
Macquarie University, NSW 2109, Australia
T: +61 2 9850 8955 | F: +61 2 9850 8114
M:+61 407 288 255 | E: ross.mo...@mq.edu.au<mailto:ross.mo...@mq.edu.au>
http://www.maths.mq.edu.au
[cid:image001.png@01D030BE.D37A46F0]
CRICOS Provider Number 00002J. Think before you print.
Please consider the environment before printing this email.

This message is intended for the addressee named and may
contain confidential information. If you are not the intended
recipient, please delete it and notify the sender. Views expressed
in this message are those of the individual sender, and are not
necessarily the views of Macquarie University. <http://mq.edu.au/>
<http://mq.edu.au/>

Re: [XeTeX] latin-1 encoded characters in commented out parts trigger log warnings

Reply via email to