Re: [XeTeX] latin-1 encoded characters in commented out parts trigger log warnings

Ross Moore Sun, 21 Feb 2021 03:46:36 -0800

Hi David.

On 21 Feb 2021, at 10:12 pm, David Carlisle 
<d.p.carli...@gmail.com<mailto:d.p.carli...@gmail.com>> wrote:


I think that should be taken up with the xstring maintainers.

Is  xstring  intended for use with XeTeX ?
I suspect not.
But anyway, there are still issues with this.

(BTW, I wrote this before Jonathan Kew’s response.)


I don't think there is any reasonable way to say you can comment out parts of a 
file in a different encoding.

I’m not convinced that this ought to be correct for TeX-based software.

TeX (not necessarily XeTeX) has always operated as a finite-state machine.
It *should* be possible to say that this part is encoded as such-and-such,
and a later part encoded differently.

I fully understand that editor software external to TeX might well have 
difficulties
with files that mix encodings this way, but TeX itself has always been 
byte-based
and should remain that way.

A comment character is meant to be viewed as saying that:
 *everything else on this line is to be ignored*
– that’s the impression given by TeX documentation.

If it is the documentation that is incorrect, then it should certainly be 
clarified.

For XeTeX and this particular example, it’s probably just a matter of checking
that the non-UTF8 characters occur *after* a UTF-8  ‘%' , and not issuing
an error message under these conditions.
A warning, maybe, but not an error.


The file encoding specifies the byte stream interpretation before any tex 
tokenization
If the file can not be interpreted as utf-8 then it can't be interpreted at all.

Why not?
Why can you not have a macro — presumably best on a single line by itself –
that says what follows next is to be interpreted in a different way to what 
came previously?
Until the next switch that returns to UTF-8 or whatever?


If XeTeX is based on eTeX, then this should be possible in that setting.


Even replacing by U+FFFD
is being lenient.

David




On Sun, 21 Feb 2021 at 11:04, jfbu <j...@free.fr<mailto:j...@free.fr>> wrote:
Hi,

consider this

\documentclass{article}
\usepackage{xstring}
\begin{document}
\end{document}

and call it xexstring.tex

Then xelatex xexstring triggers 136 warnings of the type

Invalid UTF-8 byte or sequence at line 35 replaced by U+FFFD.

Looking at file

/usr/local/texlive/2020/texmf-dist/tex/generic/xstring/xstring.tex

I see that this matches with use of latin-1 encoded characters in comments.

Notice that it is a not a user decision here to use a latin-1
encoded file.

In fact I encountered this in a file I was given where
xstring package was loaded by another package.

Regards,

Jean-François


Cheers.

Ross


Dr Ross Moore
Department of Mathematics and Statistics
12 Wally’s Walk, Level 7, Room 734
Macquarie University, NSW 2109, Australia
T: +61 2 9850 8955  |  F: +61 2 9850 8114
M:+61 407 288 255  |  E: ross.mo...@mq.edu.au<mailto:ross.mo...@mq.edu.au>
http://www.maths.mq.edu.au
[cid:image001.png@01D030BE.D37A46F0]
CRICOS Provider Number 00002J. Think before you print.
Please consider the environment before printing this email.

This message is intended for the addressee named and may
contain confidential information. If you are not the intended
recipient, please delete it and notify the sender. Views expressed
in this message are those of the individual sender, and are not
necessarily the views of Macquarie University. <http://mq.edu.au/>
<http://mq.edu.au/>

Re: [XeTeX] latin-1 encoded characters in commented out parts trigger log warnings

Reply via email to