-78ba24467...@evertype.com
To: Michael Everson ever...@evertype.com
X-Mailer: Apple Mail (2.1278)
On 13 Jul 2012, at 00:34, Michael Everson wrote:
On 12 Jul 2012, at 23:27, Hans Aberg wrote:
On 12 Jul 2012, at 23:47, Michael Everson wrote:
...
Is it in print?
...
If so, then it should
On 13 Jul 2012, at 09:49, Hans Aberg wrote:
Local documents on your computer don't do me any good.
FYI, in the TeX world, one can go in on CTAN http://ctan.org/ and make a
search http://ctan.org/search/. However, with the TeX Live package
http://www.tug.org/texlive/ installed, that is
On 2012/07/13 0:12, Leif Halvard Silli wrote:
Doug Ewell, Wed, 11 Jul 2012 09:12:46 -0600:
and people who want to create or modify UTF-8 files which will
be consumed by a process that is intolerant of the signature
should not use Notepad. That goes for HTML (pre-5) pages [snip]
HTML5-parsers
On 2012-07-12, Michael Everson ever...@evertype.com wrote:
On 12 Jul 2012, at 22:20, Julian Bradfield wrote:
But wanting to do so would be crazy. My mu-nu ligature is, as far as I know,
used only by me (and co-authors who let me do the typesetting), and so if
Unicode has any sanity left, it
On 13 Jul 2012, at 11:07, Julian Bradfield wrote:
On 2012-07-12, Michael Everson ever...@evertype.com wrote:
On 12 Jul 2012, at 22:20, Julian Bradfield wrote:
But wanting to do so would be crazy. My mu-nu ligature is, as far as I
know, used only by me (and co-authors who let me do the
2D bee70f00-1c53-4d0c-8954-a94ec478f...@telia.com
380c6ab8-d40b-4d9d-af48-d01afab86...@evertype.com
To: Michael Everson ever...@evertype.com
X-Mailer: Apple Mail (2.1278)
On 13 Jul 2012, at 10:57, Michael Everson wrote:
On 13 Jul 2012, at 09:49, Hans Aberg wrote:
Local documents on your
On 2012-07-13, Michael Everson ever...@evertype.com wrote:
On 13 Jul 2012, at 11:07, Julian Bradfield wrote:
So... U+1D7CC MATHEMATICAL ITALIC SMALL MU NU LIGATURE, since it's published
and (assuming the work is worthy; I cannot judge) might be cited by others.
It *might*, by some hapless
Leif Halvard Silli, Fri, 13 Jul 2012 13:44:42 +0200:
I do at least not think that user agents that
want to be conforming pre-HTML5 user agents have any justification for
ignoring the BOM.
* The effect of the BOM - as encoding signature - is not discussed
anywhere in HTML4 or in the
The time to encode this ad-hoc symbol would arrive some time after
others republish your proof *without* choosing a different symbol...at
which point it would have become part of a convention.
A./
On 7/13/2012 5:20 AM, Julian Bradfield wrote:
On 2012-07-13, Michael Everson
On 7/13/2012 3:07 AM, Julian Bradfield wrote:
My colleagues in the Edinburgh PEPA group did try to get their pet symbol
encoded (a bowtie where the two triangles overlap somewhat rather than just
touching), but were refused; although that symbol now appears in hundreds of
papers by dozens of
On 7/13/2012 1:57 AM, Michael Everson wrote:
That document is 164 pages long. I would be interested in examining it
after someone else has done the background work of a first pass at
identifying which characters are already encoded. This is sort of an
emoji/wingdings/webdings scenario, I
2012-07-13 16:12, Leif Halvard Silli wrote:
The kind of BOM intolerance I know about in user agents is that some
text browsers and IE5 for Mac (abandoned) convert the BOM into a
(typically empty) line a the start of the body element.
I wonder if there is any evidence of browsers currently in
The TeX collection includes things which are not only mathematical symbols. No
need to be so dismissive, Asmus.
On 13 Jul 2012, at 14:24, Asmus Freytag wrote:
On 7/13/2012 1:57 AM, Michael Everson wrote:
That document is 164 pages long. I would be interested in examining it after
someone
Philippe Verdy verd...@wanadoo.fr wrote:
|2012/7/12 Steven Atreju snatr...@googlemail.com:
| UTF-8 is a bytestream, not multioctet(/multisequence).
|Not even. UTF-8 is a text-stream, not made of arbitrary sequences of
|bytes. It has a lot of internal semantics and constraints. Some things
You sum up my views.
The warnings appear as routine.
Leif
--- Opprinnelig melding ---
Fra: Jukka K. Korpela jkorp...@cs.tut.fi
Til: unicode@unicode.org
Sendt: 13/7/'12, 15:31
2012-07-13 16:12, Leif Halvard Silli wrote:
The kind of BOM intolerance I know about in user agents is
2012/7/13 Steven Atreju snatr...@googlemail.com:
Philippe Verdy verd...@wanadoo.fr wrote:
|2012/7/12 Steven Atreju snatr...@googlemail.com:
| UTF-8 is a bytestream, not multioctet(/multisequence).
|Not even. UTF-8 is a text-stream, not made of arbitrary sequences of
|bytes. It has a lot
Another myth, e.g. in wikipedia, is that Unicode warns against the utf-8 bom,
see the footnote
en.m.wikipedia.org/wiki/UTF-8#cite_note-27
Leif
--- Opprinnelig melding ---
Fra: Jukka K. Korpela jkorp...@cs.tut.fi
Til: unicode@unicode.org
Sendt: 13/7/'12, 15:31
2012-07-13 16:12,
Fra: Jukka K. Korpela jkorp...@cs.tut.fi
When the BOM is used in web pages or editors for UTF-8 encoded content it
can sometimes introduce blank spaces or short sequences of strange-looking
characters (such as ). For this reason, it is usually best for
interoperability to omit the BOM,
Date: Fri, 13 Jul 2012 16:04:44 +0200
From: Steven Atreju snatr...@googlemail.com
For example, this mail is
written in an UTF-8 enabled vi(1) basically from 1986, in UTF-8
encoding («Schöne Überraschung, gelle?»
No, it isn't:
User-Agent: S-nail 12.5 7/5/10;s-nail-9-g517ac44-dirty
On Fri, Jul 13, 2012 at 9:11 AM, Leif H Silli
xn--mlform-...@xn--mlform-iua.no wrote:
Another myth, e.g. in wikipedia, is that Unicode warns against the utf-8
bom, see the footnote
en.m.wikipedia.org/wiki/UTF-8#cite_note-27
Wikipedia says The Unicode standard recommends against the BOM for
Eli Zaretskii e...@gnu.org wrote:
| For example, this mail is
| written in an UTF-8 enabled vi(1) basically from 1986, in UTF-8
| encoding («Schöne Überraschung, gelle?»
|
|No, it isn't:
|
|Content-Type: text/plain; charset=ISO-8859-1
Oh, it's really terrible. I do have
2012-07-13 22:37, David Starner wrote:
Wikipedia says The Unicode standard recommends against the BOM for
UTF-8. and refers to page 30 of the Unicode Standard, version 6.0,
that says Use of a BOM is neither required nor recommended for
UTF-8... Calling it a myth seems bizarre.
“Not
Philippe Verdy verd...@wanadoo.fr wrote:
|2012/7/13 Steven Atreju snatr...@googlemail.com:
| Philippe Verdy verd...@wanadoo.fr wrote:
|
| |2012/7/12 Steven Atreju snatr...@googlemail.com:
| | UTF-8 is a bytestream, not multioctet(/multisequence).
| |Not even. UTF-8 is a text-stream, not
As an aside to the BOM discussion - something I've always been meaning
to ask.
So there is a BOM-ambiguity when a file starts with
FF FE
and then a couple of U+ characters, yes? Because this could be
either UTF-16 or UTF-32 under little-endianness. Has this been pointed
out and
On 7/13/2012 6:37 AM, Michael Everson wrote:
The TeX collection includes things which are not only mathematical symbols. No
need to be so dismissive, Asmus.
No need to be so ... - my comment was carefully worded to apply
explicitly to mathematical usage only - and was issued in the context
Null characters are almost always avoided in interchanged plain texts.
This is not a practicle problem. The use of nulls as significant
characters is extremely exceptional, as they almost always require an
envelope format to specify data lengths. this envelope format is in a
file that is not
Null characters are almost always avoided in interchanged plain texts.
This is not a practicle problem. The use of nulls as significant
characters is extremely exceptional
Yes, but still I think that the BOM ambiguity needs to be documented. If
it already is, the documentation isn't visible or
On Fri, Jul 13, 2012 at 1:29 PM, Jukka K. Korpela jkorp...@cs.tut.fi wrote:
2012-07-13 22:37, David Starner wrote:
Wikipedia says The Unicode standard recommends against the BOM for
UTF-8. and refers to page 30 of the Unicode Standard, version 6.0,
that says Use of a BOM is neither required
A) treating NUL as ignorable is really deep legacy. Totally no longer
appropriate for modern data.
B) there are many Unicode character codes with leading or trailing or
other NUL bytes, so UTF-16 and UTF-32 cannot be exchanged under the
assumption of NUL is ignorable
A./
On 7/13/2012 2:16
2012/7/13 Asmus Freytag asm...@ix.netcom.com:
A) treating NUL as ignorable is really deep legacy. Totally no longer
appropriate for modern data.
I did not say that. But modern data heavily uses bytes as fillers for
padding, or as terminators in various enveloppe formats. There are
some more
On 7/13/2012 2:42 PM, David Starner wrote:
On Fri, Jul 13, 2012 at 1:29 PM, Jukka K. Korpela jkorp...@cs.tut.fi wrote:
2012-07-13 22:37, David Starner wrote:
Wikipedia says The Unicode standard recommends against the BOM for
UTF-8. and refers to page 30 of the Unicode Standard, version 6.0,
It would break if the only place where to place a BOM is just the
start of a file. But as I propose, we allow BOMs to occur anywhere to
specify which encoding to use to decode what follows each one, even
shell scripts would work (you could place the BOM on a comment line
after a hash symbol, that
On 7/13/2012 1:54 PM, Stephan Stiller wrote:
So there is a BOM-ambiguity when a file starts with
FF FE
and then a couple of U+ characters, yes? Because this could be
either UTF-16 or UTF-32 under little-endianness. Has this been pointed
out and discussed beforehand?
No, there is
Just eliminate the cases where you find U+. For plain-text files
they are not useful. If you're trying to guess which encoding is used
in an HTML or XML file, you won't find any null (because they are
invalid in those formats, in all enodings even with ISO-8859-*). In
those conditions, there's
On Jul 13, 2012, at 4:54 PM, Stephan Stiller wrote:
As an aside to the BOM discussion - something I've always been meaning to ask.
So there is a BOM-ambiguity when a file starts with
FF FE
and then a couple of U+ characters, yes? Because this could be either
UTF-16 or UTF-32 under
Hi. I realize that the bidi parenthes algorithm is not currently being
discussed on the list, but wanted to cc the list with my feedback (I've already
sent it to unicode (using the form), but I wanted to make double sure that my
feedback gets to the right place; also I've made a few edits
So there is a BOM-ambiguity when a file starts with
FF FE
and then a couple of U+ characters, yes? Because this could be
either UTF-16 or UTF-32 under little-endianness. Has this been
pointed out and discussed beforehand?
No, there is not a BOM-ambiguity. Rather, there is an English
PS: I mean, what you (Ken W) are writing is an argument for documenting
the format outside of the file proper, and that's good, but then one
wouldn't/shouldn't use a BOM in the first place.
So if one uses the BOM as a format indicator (not a perfect situation, I
understand), that often
38 matches
Mail list logo