Re: Too narrowly defined: DIVISION SIGN COLON

2012-07-13 Thread Hans Aberg
-78ba24467...@evertype.com To: Michael Everson ever...@evertype.com X-Mailer: Apple Mail (2.1278) On 13 Jul 2012, at 00:34, Michael Everson wrote: On 12 Jul 2012, at 23:27, Hans Aberg wrote: On 12 Jul 2012, at 23:47, Michael Everson wrote: ... Is it in print? ... If so, then it should

Re: Too narrowly defined: DIVISION SIGN COLON

2012-07-13 Thread Michael Everson
On 13 Jul 2012, at 09:49, Hans Aberg wrote: Local documents on your computer don't do me any good. FYI, in the TeX world, one can go in on CTAN http://ctan.org/ and make a search http://ctan.org/search/. However, with the TeX Live package http://www.tug.org/texlive/ installed, that is

Re: pre-HTML5 and the BOM

2012-07-13 Thread Martin J. Dürst
On 2012/07/13 0:12, Leif Halvard Silli wrote: Doug Ewell, Wed, 11 Jul 2012 09:12:46 -0600: and people who want to create or modify UTF-8 files which will be consumed by a process that is intolerant of the signature should not use Notepad. That goes for HTML (pre-5) pages [snip] HTML5-parsers

Re: Too narrowly defined: DIVISION SIGN COLON

2012-07-13 Thread Julian Bradfield
On 2012-07-12, Michael Everson ever...@evertype.com wrote: On 12 Jul 2012, at 22:20, Julian Bradfield wrote: But wanting to do so would be crazy. My mu-nu ligature is, as far as I know, used only by me (and co-authors who let me do the typesetting), and so if Unicode has any sanity left, it

Re: Too narrowly defined: DIVISION SIGN COLON

2012-07-13 Thread Michael Everson
On 13 Jul 2012, at 11:07, Julian Bradfield wrote: On 2012-07-12, Michael Everson ever...@evertype.com wrote: On 12 Jul 2012, at 22:20, Julian Bradfield wrote: But wanting to do so would be crazy. My mu-nu ligature is, as far as I know, used only by me (and co-authors who let me do the

Re: Too narrowly defined: DIVISION SIGN COLON

2012-07-13 Thread Hans Aberg
2D bee70f00-1c53-4d0c-8954-a94ec478f...@telia.com 380c6ab8-d40b-4d9d-af48-d01afab86...@evertype.com To: Michael Everson ever...@evertype.com X-Mailer: Apple Mail (2.1278) On 13 Jul 2012, at 10:57, Michael Everson wrote: On 13 Jul 2012, at 09:49, Hans Aberg wrote: Local documents on your

Re: Too narrowly defined: DIVISION SIGN COLON

2012-07-13 Thread Julian Bradfield
On 2012-07-13, Michael Everson ever...@evertype.com wrote: On 13 Jul 2012, at 11:07, Julian Bradfield wrote: So... U+1D7CC MATHEMATICAL ITALIC SMALL MU NU LIGATURE, since it's published and (assuming the work is worthy; I cannot judge) might be cited by others. It *might*, by some hapless

Re: pre-HTML5 and the BOM

2012-07-13 Thread Leif Halvard Silli
Leif Halvard Silli, Fri, 13 Jul 2012 13:44:42 +0200: I do at least not think that user agents that want to be conforming pre-HTML5 user agents have any justification for ignoring the BOM. * The effect of the BOM - as encoding signature - is not discussed anywhere in HTML4 or in the

Re: Too narrowly defined: DIVISION SIGN COLON

2012-07-13 Thread Asmus Freytag
The time to encode this ad-hoc symbol would arrive some time after others republish your proof *without* choosing a different symbol...at which point it would have become part of a convention. A./ On 7/13/2012 5:20 AM, Julian Bradfield wrote: On 2012-07-13, Michael Everson

Re: Too narrowly defined: DIVISION SIGN COLON

2012-07-13 Thread Asmus Freytag
On 7/13/2012 3:07 AM, Julian Bradfield wrote: My colleagues in the Edinburgh PEPA group did try to get their pet symbol encoded (a bowtie where the two triangles overlap somewhat rather than just touching), but were refused; although that symbol now appears in hundreds of papers by dozens of

Re: Too narrowly defined: DIVISION SIGN COLON

2012-07-13 Thread Asmus Freytag
On 7/13/2012 1:57 AM, Michael Everson wrote: That document is 164 pages long. I would be interested in examining it after someone else has done the background work of a first pass at identifying which characters are already encoded. This is sort of an emoji/wingdings/webdings scenario, I

Re: pre-HTML5 and the BOM

2012-07-13 Thread Jukka K. Korpela
2012-07-13 16:12, Leif Halvard Silli wrote: The kind of BOM intolerance I know about in user agents is that some text browsers and IE5 for Mac (abandoned) convert the BOM into a (typically empty) line a the start of the body element. I wonder if there is any evidence of browsers currently in

Re: Too narrowly defined: DIVISION SIGN COLON

2012-07-13 Thread Michael Everson
The TeX collection includes things which are not only mathematical symbols. No need to be so dismissive, Asmus. On 13 Jul 2012, at 14:24, Asmus Freytag wrote: On 7/13/2012 1:57 AM, Michael Everson wrote: That document is 164 pages long. I would be interested in examining it after someone

Re: UTF-8 BOM (Re: Charset declaration in HTML)

2012-07-13 Thread Steven Atreju
Philippe Verdy verd...@wanadoo.fr wrote: |2012/7/12 Steven Atreju snatr...@googlemail.com: | UTF-8 is a bytestream, not multioctet(/multisequence). |Not even. UTF-8 is a text-stream, not made of arbitrary sequences of |bytes. It has a lot of internal semantics and constraints. Some things

Re: pre-HTML5 and the BOM

2012-07-13 Thread Leif H Silli
You sum up my views. The warnings appear as routine. Leif --- Opprinnelig melding --- Fra: Jukka K. Korpela jkorp...@cs.tut.fi Til: unicode@unicode.org Sendt: 13/7/'12, 15:31 2012-07-13 16:12, Leif Halvard Silli wrote: The kind of BOM intolerance I know about in user agents is

Re: UTF-8 BOM (Re: Charset declaration in HTML)

2012-07-13 Thread Philippe Verdy
2012/7/13 Steven Atreju snatr...@googlemail.com: Philippe Verdy verd...@wanadoo.fr wrote: |2012/7/12 Steven Atreju snatr...@googlemail.com: | UTF-8 is a bytestream, not multioctet(/multisequence). |Not even. UTF-8 is a text-stream, not made of arbitrary sequences of |bytes. It has a lot

Re: pre-HTML5 and the BOM

2012-07-13 Thread Leif H Silli
Another myth, e.g. in wikipedia, is that Unicode warns against the utf-8 bom, see the footnote en.m.wikipedia.org/wiki/UTF-8#cite_note-27 Leif --- Opprinnelig melding --- Fra: Jukka K. Korpela jkorp...@cs.tut.fi Til: unicode@unicode.org Sendt: 13/7/'12, 15:31 2012-07-13 16:12,

Re: pre-HTML5 and the BOM

2012-07-13 Thread Philippe Verdy
Fra: Jukka K. Korpela jkorp...@cs.tut.fi When the BOM is used in web pages or editors for UTF-8 encoded content it can sometimes introduce blank spaces or short sequences of strange-looking characters (such as ). For this reason, it is usually best for interoperability to omit the BOM,

Re: UTF-8 BOM (Re: Charset declaration in HTML)

2012-07-13 Thread Eli Zaretskii
Date: Fri, 13 Jul 2012 16:04:44 +0200 From: Steven Atreju snatr...@googlemail.com For example, this mail is written in an UTF-8 enabled vi(1) basically from 1986, in UTF-8 encoding («Schöne Überraschung, gelle?» No, it isn't: User-Agent: S-nail 12.5 7/5/10;s-nail-9-g517ac44-dirty

Re: pre-HTML5 and the BOM

2012-07-13 Thread David Starner
On Fri, Jul 13, 2012 at 9:11 AM, Leif H Silli xn--mlform-...@xn--mlform-iua.no wrote: Another myth, e.g. in wikipedia, is that Unicode warns against the utf-8 bom, see the footnote en.m.wikipedia.org/wiki/UTF-8#cite_note-27 Wikipedia says The Unicode standard recommends against the BOM for

Re: UTF-8 BOM (Re: Charset declaration in HTML)

2012-07-13 Thread Steven Atreju
Eli Zaretskii e...@gnu.org wrote: | For example, this mail is | written in an UTF-8 enabled vi(1) basically from 1986, in UTF-8 | encoding («Schöne Überraschung, gelle?» | |No, it isn't: | |Content-Type: text/plain; charset=ISO-8859-1 Oh, it's really terrible. I do have

Re: pre-HTML5 and the BOM

2012-07-13 Thread Jukka K. Korpela
2012-07-13 22:37, David Starner wrote: Wikipedia says The Unicode standard recommends against the BOM for UTF-8. and refers to page 30 of the Unicode Standard, version 6.0, that says Use of a BOM is neither required nor recommended for UTF-8... Calling it a myth seems bizarre. “Not

Re: UTF-8 BOM (Re: Charset declaration in HTML)

2012-07-13 Thread Steven Atreju
Philippe Verdy verd...@wanadoo.fr wrote: |2012/7/13 Steven Atreju snatr...@googlemail.com: | Philippe Verdy verd...@wanadoo.fr wrote: | | |2012/7/12 Steven Atreju snatr...@googlemail.com: | | UTF-8 is a bytestream, not multioctet(/multisequence). | |Not even. UTF-8 is a text-stream, not

BOM ambiguity?

2012-07-13 Thread Stephan Stiller
As an aside to the BOM discussion - something I've always been meaning to ask. So there is a BOM-ambiguity when a file starts with FF FE and then a couple of U+ characters, yes? Because this could be either UTF-16 or UTF-32 under little-endianness. Has this been pointed out and

More emoji - (was Re: Too narrowly defined: DIVISION SIGN COLON)

2012-07-13 Thread Asmus Freytag
On 7/13/2012 6:37 AM, Michael Everson wrote: The TeX collection includes things which are not only mathematical symbols. No need to be so dismissive, Asmus. No need to be so ... - my comment was carefully worded to apply explicitly to mathematical usage only - and was issued in the context

Re: BOM ambiguity?

2012-07-13 Thread Philippe Verdy
Null characters are almost always avoided in interchanged plain texts. This is not a practicle problem. The use of nulls as significant characters is extremely exceptional, as they almost always require an envelope format to specify data lengths. this envelope format is in a file that is not

Re: BOM ambiguity?

2012-07-13 Thread Stephan Stiller
Null characters are almost always avoided in interchanged plain texts. This is not a practicle problem. The use of nulls as significant characters is extremely exceptional Yes, but still I think that the BOM ambiguity needs to be documented. If it already is, the documentation isn't visible or

Re: pre-HTML5 and the BOM

2012-07-13 Thread David Starner
On Fri, Jul 13, 2012 at 1:29 PM, Jukka K. Korpela jkorp...@cs.tut.fi wrote: 2012-07-13 22:37, David Starner wrote: Wikipedia says The Unicode standard recommends against the BOM for UTF-8. and refers to page 30 of the Unicode Standard, version 6.0, that says Use of a BOM is neither required

Re: BOM ambiguity?

2012-07-13 Thread Asmus Freytag
A) treating NUL as ignorable is really deep legacy. Totally no longer appropriate for modern data. B) there are many Unicode character codes with leading or trailing or other NUL bytes, so UTF-16 and UTF-32 cannot be exchanged under the assumption of NUL is ignorable A./ On 7/13/2012 2:16

Re: BOM ambiguity?

2012-07-13 Thread Philippe Verdy
2012/7/13 Asmus Freytag asm...@ix.netcom.com: A) treating NUL as ignorable is really deep legacy. Totally no longer appropriate for modern data. I did not say that. But modern data heavily uses bytes as fillers for padding, or as terminators in various enveloppe formats. There are some more

Re: pre-HTML5 and the BOM

2012-07-13 Thread Asmus Freytag
On 7/13/2012 2:42 PM, David Starner wrote: On Fri, Jul 13, 2012 at 1:29 PM, Jukka K. Korpela jkorp...@cs.tut.fi wrote: 2012-07-13 22:37, David Starner wrote: Wikipedia says The Unicode standard recommends against the BOM for UTF-8. and refers to page 30 of the Unicode Standard, version 6.0,

Re: pre-HTML5 and the BOM

2012-07-13 Thread Philippe Verdy
It would break if the only place where to place a BOM is just the start of a file. But as I propose, we allow BOMs to occur anywhere to specify which encoding to use to decode what follows each one, even shell scripts would work (you could place the BOM on a comment line after a hash symbol, that

Re: BOM ambiguity?

2012-07-13 Thread Ken Whistler
On 7/13/2012 1:54 PM, Stephan Stiller wrote: So there is a BOM-ambiguity when a file starts with FF FE and then a couple of U+ characters, yes? Because this could be either UTF-16 or UTF-32 under little-endianness. Has this been pointed out and discussed beforehand? No, there is

Re: BOM ambiguity?

2012-07-13 Thread Philippe Verdy
Just eliminate the cases where you find U+. For plain-text files they are not useful. If you're trying to guess which encoding is used in an HTML or XML file, you won't find any null (because they are invalid in those formats, in all enodings even with ISO-8859-*). In those conditions, there's

Re: BOM ambiguity?

2012-07-13 Thread John W Kennedy
On Jul 13, 2012, at 4:54 PM, Stephan Stiller wrote: As an aside to the BOM discussion - something I've always been meaning to ask. So there is a BOM-ambiguity when a file starts with FF FE and then a couple of U+ characters, yes? Because this could be either UTF-16 or UTF-32 under

Feedback on PRI #231: Bidi Parentheses Algorithm (Was: PRI #231: Bidi Parenthesis Algorithm)

2012-07-13 Thread CE Whitehead
Hi. I realize that the bidi parenthes algorithm is not currently being discussed on the list, but wanted to cc the list with my feedback (I've already sent it to unicode (using the form), but I wanted to make double sure that my feedback gets to the right place; also I've made a few edits

Re: BOM ambiguity?

2012-07-13 Thread Stephan Stiller
So there is a BOM-ambiguity when a file starts with FF FE and then a couple of U+ characters, yes? Because this could be either UTF-16 or UTF-32 under little-endianness. Has this been pointed out and discussed beforehand? No, there is not a BOM-ambiguity. Rather, there is an English

Fwd: Re: BOM ambiguity?

2012-07-13 Thread Stephan Stiller
PS: I mean, what you (Ken W) are writing is an argument for documenting the format outside of the file proper, and that's good, but then one wouldn't/shouldn't use a BOM in the first place. So if one uses the BOM as a format indicator (not a perfect situation, I understand), that often