Bug#865713: Please Start UTF-8 debian-policy Text Files with UTF-8 Signature

2017-06-28 Thread Paul Wise
On Tue, 2017-06-27 at 19:49 -0700, Paul Hardy wrote: > 1) Serving debian-policy pages on Debian servers as UTF-8 documents, > as an interim measure. I think we would want to do this for all *.txt documents that are UTF-8 and available on the website. First we need the list of UTF-8 encoded text

Bug#865713: Please Start UTF-8 debian-policy Text Files with UTF-8 Signature

2017-06-27 Thread Paul Hardy
Given all the discussion that has taken place, what do you think of: 1) Serving debian-policy pages on Debian servers as UTF-8 documents, as an interim measure. 2) Given that this is only a solution for web servers under Debian's control, given that using the UTF-8 signature is now accepted

Bug#865713: Please Start UTF-8 debian-policy Text Files with UTF-8 Signature

2017-06-25 Thread Paul Hardy
Paul, On Sun, Jun 25, 2017 at 8:24 PM, Paul Wise wrote: > On Sun, 2017-06-25 at 16:07 -0700, Paul Hardy wrote: > >> Earlier today, I sent the GNU less maintainer a two-line patch to the >> "charset.c" file after my original email to him. > > I'm no expert on the less source

Bug#865713: Please Start UTF-8 debian-policy Text Files with UTF-8 Signature

2017-06-25 Thread Paul Wise
On Sun, 2017-06-25 at 16:07 -0700, Paul Hardy wrote: > Earlier today, I sent the GNU less maintainer a two-line patch to the > "charset.c" file after my original email to him. I'm no expert on the less source code, but it seems to me that it will also hide U+FEFF characters after the first one.

Bug#865713: Please Start UTF-8 debian-policy Text Files with UTF-8 Signature

2017-06-25 Thread Paul Hardy
On Sat, Jun 24, 2017 at 1:59 PM, Russ Allbery wrote: > Russ Allbery writes: > >> I did a bit more research, and apparently this approach has become more >> blessed again... > > Okay, I experimented with this, but unfortunately less displays the BOM at > the

Bug#865713: Please Start UTF-8 debian-policy Text Files with UTF-8 Signature

2017-06-25 Thread Paul Hardy
[On the use of the UTF-8 signature, aka the BOM, at the start of a UTF-8 file] On Sat, Jun 24, 2017 at 1:59 PM, Russ Allbery wrote: > Russ Allbery writes: > >> I did a bit more research, and apparently this approach has become more >> blessed again.. > > Okay,

Bug#865713: Please Start UTF-8 debian-policy Text Files with UTF-8 Signature

2017-06-25 Thread Henrique de Moraes Holschuh
On Sat, 24 Jun 2017, Russ Allbery wrote: > Russ Allbery writes: > > I did a bit more research, and apparently this approach has become more > > blessed again. I'm glad I looked it up! As of Unicode 5.0, the ... > Okay, I experimented with this, but unfortunately less displays

Bug#865713: Please Start UTF-8 debian-policy Text Files with UTF-8 Signature

2017-06-24 Thread Russ Allbery
Russ Allbery writes: > I did a bit more research, and apparently this approach has become more > blessed again. I'm glad I looked it up! As of Unicode 5.0, the > standard explicitly recommended against doing this, but the latest > version of the standard is moderately positive

Bug#865713: Declaring a charset of UTF-8 for policy files (was: Re: Bug#865713: Please Start UTF-8 debian-policy Text Files with UTF-8 Signature)

2017-06-24 Thread Russ Allbery
Colin Watson writes: > On Fri, Jun 23, 2017 at 11:49:20PM -0700, Russ Allbery wrote: >> I'm still a bit dubious about this, since I don't believe editors and >> generators normally add it, but given how we generate the text versions >> of the documents, it's relatively easy

Bug#865713: Please Start UTF-8 debian-policy Text Files with UTF-8 Signature

2017-06-24 Thread Russ Allbery
Paul Hardy writes: > Alternatively, if convenient, you could convert the non-breaking space > characters to a plain space in that text file in a script. That will > avoid the problem until you need some other non-ASCII character in the > file other than non-breaking space.

Bug#865713: Please Start UTF-8 debian-policy Text Files with UTF-8 Signature

2017-06-24 Thread Paul Hardy
On Sat, Jun 24, 2017 at 2:51 AM, Colin Watson wrote: > On Fri, Jun 23, 2017 at 11:49:20PM -0700, Russ Allbery wrote: >> I'm still a bit dubious about this, since I don't believe editors and >> generators normally add it, but given how we generate the text versions of >> the

Bug#865713: Please Start UTF-8 debian-policy Text Files with UTF-8 Signature

2017-06-24 Thread Colin Watson
On Fri, Jun 23, 2017 at 11:49:20PM -0700, Russ Allbery wrote: > I'm still a bit dubious about this, since I don't believe editors and > generators normally add it, but given how we generate the text versions of > the documents, it's relatively easy to add a leading BOM and seems > harmless. I'll

Bug#865713: Please Start UTF-8 debian-policy Text Files with UTF-8 Signature

2017-06-24 Thread Russ Allbery
Russ Allbery writes: > I don't believe it's correct to expect UTF-8 files to include this. > I've heard of BOM marks used this from the very early days of Unicode, > but so far as I understand it, the world has largely given up on this > approach and UTF-8 generators do not

Bug#865713: Please Start UTF-8 debian-policy Text Files with UTF-8 Signature

2017-06-24 Thread Russ Allbery
Paul Hardy writes: > That might not be the only UTF-8 that appears in such files someday > though, so a more general solution would be to start the file with the > UTF-8 signature, aka the Byte Order Mark (BOM). This is the UTF-8 > encoding of U+FEFF, which is 0xEF 0xBB

Bug#865713: Please Start UTF-8 debian-policy Text Files with UTF-8 Signature

2017-06-23 Thread Paul Hardy
Package: debian-policy Version: 4.0.0.2 Severity: minor Tags: patch Justification: garbled display (mojibake) in web browsers Dear debian-policy Maintainers, There are numerous non-breaking space characters (U+00A0) in: https://www.debian.org/doc/packaging-manuals/upgrading-checklist.txt These