Re: [XeTeX] latin-1 encoded characters in commented out parts trigger log warnings

2021-02-22 Thread David Carlisle
On Mon, 22 Feb 2021 at 01:28, Ross Moore wrote: > Hi Jonathan, and others. > > > There’s actually a pretty easy fix, at least for XeLaTeX. > The package contains 2 files only: xstring.sty and xstring.tex . > The .sty is just a 1-liner to load the .tex . > > It could be beefed up with: > >

Re: [XeTeX] latin-1 encoded characters in commented out parts trigger log warnings

2021-02-21 Thread Ross Moore
Hi Jonathan, and others. On 22 Feb 2021, at 10:39 am, Jonathan Kew mailto:jfkth...@gmail.com>> wrote: On 21/02/2021 22:55, Ross Moore wrote: The file reading has failed before any tex accessible processing has happened (see the ebcdic example in the TeXBook) OK. Also pdfTeX has no trouble

Re: [XeTeX] latin-1 encoded characters in commented out parts trigger log warnings

2021-02-21 Thread Jonathan Kew
On 21/02/2021 22:55, Ross Moore wrote: The file reading has failed  before any tex accessible processing has happened (see the ebcdic example in the TeXBook) OK. But that’s changing the meaning of bit-order, yes? Surely we can be past that. No, it's not about bit-order; it's about changing

Re: [XeTeX] latin-1 encoded characters in commented out parts trigger log warnings

2021-02-21 Thread Jonathan Kew
On 21/02/2021 22:55, Ross Moore wrote: Hi David, On 22 Feb 2021, at 8:43 am, David Carlisle > wrote: Surely the line-end characters are already known, and the bits have been read up to that point *before* tokenisation. This is not a pdflatex inputenc

Re: [XeTeX] latin-1 encoded characters in commented out parts trigger log warnings

2021-02-21 Thread Ross Moore
Hi David, On 22 Feb 2021, at 8:43 am, David Carlisle mailto:d.p.carli...@gmail.com>> wrote: Surely the line-end characters are already known, and the bits have been read up to that point *before* tokenisation. This is not a pdflatex inputenc style utf-8 error failing to map a stream of

Re: [XeTeX] latin-1 encoded characters in commented out parts trigger log warnings

2021-02-21 Thread Jonathan Kew
On 21/02/2021 21:48, Bruno Le Floch wrote: I think your model of what XeTeX is doing is missing a step. It's important to distinguish two steps, which are a bit mixed up in some of the comments here. I'm not 100\% sure either, so perhaps more knowledgeable people can chime in. - The file is

Re: [XeTeX] latin-1 encoded characters in commented out parts trigger log warnings

2021-02-21 Thread Bruno Le Floch
Hi Ross, On 2/21/21 10:42 PM, Ross Moore wrote: > Hi Ulrike, > >> On 22 Feb 2021, at 7:52 am, Ulrike Fischer wrote: >> >> Am Sun, 21 Feb 2021 20:26:04 + schrieb Ross Moore: >> >> > Once you have encountered the (correct) comment character, >> > what follows on the rest of the line is going

Re: [XeTeX] latin-1 encoded characters in commented out parts trigger log warnings

2021-02-21 Thread David Carlisle
On Sun, 21 Feb 2021 at 20:27, Ross Moore wrote: > Hi David, > > Surely the line-end characters are already known, and the bits > have been read up to that point *before* tokenisation. > This is not a pdflatex inputenc style utf-8 error failing to map a stream of tokens. It is at the file

Re: [XeTeX] latin-1 encoded characters in commented out parts trigger log warnings

2021-02-21 Thread Ross Moore
Hi Ulrike, On 22 Feb 2021, at 7:52 am, Ulrike Fischer mailto:ne...@nililand.de>> wrote: Am Sun, 21 Feb 2021 20:26:04 + schrieb Ross Moore: > Once you have encountered the (correct) comment character, > what follows on the rest of the line is going to be discarded, > so its encoding is

Re: [XeTeX] latin-1 encoded characters in commented out parts trigger log warnings

2021-02-21 Thread Ulrike Fischer
Am Sun, 21 Feb 2021 20:26:04 + schrieb Ross Moore: > Once you have encountered the (correct) comment character, > what follows on the rest of the line is going to be discarded, > so its encoding is surely irrelevant. > > Why should the whole line need to be fully tokenised, > before the

Re: [XeTeX] latin-1 encoded characters in commented out parts trigger log warnings

2021-02-21 Thread Ross Moore
Hi David, On 21 Feb 2021, at 11:02 pm, David Carlisle mailto:d.p.carli...@gmail.com>> wrote: I don't think there is any reasonable way to say you can comment out parts of a file in a different encoding. I’m not convinced that this ought to be correct for TeX-based software. TeX (not

Re: [XeTeX] latin-1 encoded characters in commented out parts trigger log warnings

2021-02-21 Thread David Carlisle
On Sun, 21 Feb 2021 at 11:47, Ross Moore wrote: > Hi David. > > On 21 Feb 2021, at 10:12 pm, David Carlisle > wrote: > > I think that should be taken up with the xstring maintainers. > > > Is xstring intended for use with XeTeX ? > I suspect not. > But anyway, there are still issues with

Re: [XeTeX] latin-1 encoded characters in commented out parts trigger log warnings

2021-02-21 Thread Ross Moore
Hi David. On 21 Feb 2021, at 10:12 pm, David Carlisle mailto:d.p.carli...@gmail.com>> wrote: I think that should be taken up with the xstring maintainers. Is xstring intended for use with XeTeX ? I suspect not. But anyway, there are still issues with this. (BTW, I wrote this before Jonathan

Re: [XeTeX] latin-1 encoded characters in commented out parts trigger log warnings

2021-02-21 Thread Jonathan Kew
On 21/02/2021 11:12, David Carlisle wrote: > I think that should be taken up with the xstring maintainers. Yes, I would agree this is an xstring problem. It looks like in an older version the file was utf-8. I suspect someone saved it as Latin-1 in the course of editing, probably without

Re: [XeTeX] latin-1 encoded characters in commented out parts trigger log warnings

2021-02-21 Thread David Carlisle
I think that should be taken up with the xstring maintainers. I don't think there is any reasonable way to say you can comment out parts of a file in a different encoding. The file encoding specifies the byte stream interpretation before any tex tokenization If the file can not be interpreted as

[XeTeX] latin-1 encoded characters in commented out parts trigger log warnings

2021-02-21 Thread jfbu
Hi, consider this \documentclass{article} \usepackage{xstring} \begin{document} \end{document} and call it xexstring.tex Then xelatex xexstring triggers 136 warnings of the type Invalid UTF-8 byte or sequence at line 35 replaced by U+FFFD. Looking at file