On 16 May 2017, at 17:23, Hans Åberg wrote:
>
> HFS implements case insensitivity in a layer above the filesystem raw
> functions. So it is perfectly possible to have files that differ by case only
> in the same directory by using low level function calls. The Tenon MachTen
> did that on Mac O
> On 16 May 2017, at 18:13, Alastair Houghton
> wrote:
>
> On 16 May 2017, at 17:07, Hans Åberg wrote:
>>
> HFS(+), NTFS and VFAT long filenames are all encoded in some variation on
> UCS-2/UTF-16. ...
The filesystem directory is using octet sequences and does not bother
On 16 May 2017, at 17:07, Hans Åberg wrote:
>
HFS(+), NTFS and VFAT long filenames are all encoded in some variation on
UCS-2/UTF-16. ...
>>>
>>> The filesystem directory is using octet sequences and does not bother
>>> passing over an encoding, I am told. Someone could remember one
> On 16 May 2017, at 17:52, Alastair Houghton
> wrote:
>
> On 16 May 2017, at 16:44, Hans Åberg wrote:
>>
>> On 16 May 2017, at 17:30, Alastair Houghton via Unicode
>> wrote:
>>>
>>> HFS(+), NTFS and VFAT long filenames are all encoded in some variation on
>>> UCS-2/UTF-16. ...
>>
>> The
On 16 May 2017, at 16:44, Hans Åberg wrote:
>
> On 16 May 2017, at 17:30, Alastair Houghton via Unicode
> wrote:
>>
>> HFS(+), NTFS and VFAT long filenames are all encoded in some variation on
>> UCS-2/UTF-16. ...
>
> The filesystem directory is using octet sequences and does not bother pass
> On 16 May 2017, at 17:30, Alastair Houghton via Unicode
> wrote:
>
> On 16 May 2017, at 14:23, Hans Åberg via Unicode wrote:
>>
>> You don't. You have a filename, which is a octet sequence of unknown
>> encoding, and want to deal with it. Therefore, valid Unicode transformations
>> of the
On 16 May 2017, at 14:23, Hans Åberg via Unicode wrote:
>
> You don't. You have a filename, which is a octet sequence of unknown
> encoding, and want to deal with it. Therefore, valid Unicode transformations
> of the filename may result in that is is not being reachable.
>
> It only matters th
2017-05-16 15:23 GMT+02:00 Hans Åberg :
> All current filsystems, as far as experts could recall, use octet
> sequences at the lowest level; whatever encoding is used is built in a
> layer above
>
Not NTFS (on Windows) which uses sequences of 16bit units. Same about
FAT32/exFAT within "Long File
> On 16 May 2017, at 15:00, Philippe Verdy wrote:
>
> 2017-05-16 14:44 GMT+02:00 Hans Åberg via Unicode :
>
> > On 15 May 2017, at 12:21, Henri Sivonen via Unicode
> > wrote:
> ...
> > I think Unicode should not adopt the proposed change.
>
> It would be useful, for use with filesystems, to
On Tue, 16 May 2017 14:44:44 +0200
Hans Åberg via Unicode wrote:
> > On 15 May 2017, at 12:21, Henri Sivonen via Unicode
> > wrote:
> ...
> > I think Unicode should not adopt the proposed change.
>
> It would be useful, for use with filesystems, to have Unicode
> codepoint markers that indi
On Tue, 16 May 2017 20:08:52 +0900
"Martin J. Dürst via Unicode" wrote:
> I agree with others that ICU should not be considered to have a
> special status, it should be just one implementation among others.
> [The next point is a side issue, please don't spend too much time on
> it.] I find it
2017-05-16 14:44 GMT+02:00 Hans Åberg via Unicode :
>
> > On 15 May 2017, at 12:21, Henri Sivonen via Unicode
> wrote:
> ...
> > I think Unicode should not adopt the proposed change.
>
> It would be useful, for use with filesystems, to have Unicode codepoint
> markers that indicate how UTF-8, inc
> On 15 May 2017, at 12:21, Henri Sivonen via Unicode
> wrote:
...
> I think Unicode should not adopt the proposed change.
It would be useful, for use with filesystems, to have Unicode codepoint markers
that indicate how UTF-8, including non-valid sequences, is translated into
UTF-32 in a way
2017-05-16 12:40 GMT+02:00 Henri Sivonen via Unicode :
> > One additional note: the standard codifies this behaviour as a
> *recommendation*, not a requirement.
>
> This is an odd argument in favor of changing it. If the argument is
> that it's just a recommendation that you don't need to adhere t
Hello everybody,
[using this mail to in effect reply to different mails in the thread]
On 2017/05/16 17:31, Henri Sivonen via Unicode wrote:
On Tue, May 16, 2017 at 10:22 AM, Asmus Freytag wrote:
Under what circumstance would it matter how many U+FFFDs you see?
Maybe it doesn't, but I don
>
> The proposal actually does cover things that aren’t structurally valid,
> like your e0 e0 e0 example, which it suggests should be a single U+FFFD
> because the initial e0 denotes a three byte sequence, and your 80 80 80
> example, which it proposes should constitute three illegal subsequences
>
On Tue, May 16, 2017 at 1:09 PM, Alastair Houghton
wrote:
> On 16 May 2017, at 09:31, Henri Sivonen via Unicode
> wrote:
>>
>> On Tue, May 16, 2017 at 10:42 AM, Alastair Houghton
>> wrote:
>>> That would be true if the in-memory representation had any effect on what
>>> we’re talking about, bu
On 16 May 2017, at 09:31, Henri Sivonen via Unicode wrote:
>
> On Tue, May 16, 2017 at 10:42 AM, Alastair Houghton
> wrote:
>> That would be true if the in-memory representation had any effect on what
>> we’re talking about, but it really doesn’t.
>
> If the internal representation is UTF-16 (
> On 16 May 2017, at 10:29, David Starner wrote:
>
> On Tue, May 16, 2017 at 1:45 AM Alastair Houghton
> wrote:
> That’s true anyway; imagine the database holds raw bytes, that just happen to
> decode to U+FFFD. There might seem to be *two* names that both contain
> U+FFFD in the same place
On Tue, May 16, 2017 at 1:45 AM Alastair Houghton <
alast...@alastairs-place.net> wrote:
> That’s true anyway; imagine the database holds raw bytes, that just happen
> to decode to U+FFFD. There might seem to be *two* names that both contain
> U+FFFD in the same place. How do you distinguish bet
> On 16 May 2017, at 09:18, David Starner wrote:
>
> On Tue, May 16, 2017 at 12:42 AM Alastair Houghton
> wrote:
>> If you’re about to mutter something about security, consider this: security
>> code *should* refuse to compare strings that contain U+FFFD (or at least
>> should never treat th
On Tue, May 16, 2017 at 10:22 AM, Asmus Freytag wrote:
> but I think the way he raises this point is needlessly antagonistic.
I apologize. My level of dismay at the proposal's ICU-centricity overcame me.
On Tue, May 16, 2017 at 10:42 AM, Alastair Houghton
wrote:
> That would be true if the in-m
On Tue, May 16, 2017 at 12:42 AM Alastair Houghton <
alast...@alastairs-place.net> wrote:
> If you’re about to mutter something about security, consider this:
> security code *should* refuse to compare strings that contain U+FFFD (or at
> least should never treat them as equal, even to themselves)
On Tue, 16 May 2017 10:01:03 +0300
Henri Sivonen via Unicode wrote:
> Even so, I think even changing a recommendation of "best practice"
> needs way better rationale than "feels right" or "ICU already does it"
> when a) major browsers (which operate in the most prominent
> environment of broken a
On Mon, May 15, 2017 at 11:50 PM, Henri Sivonen via Unicode <
unicode@unicode.org> wrote:
> On Tue, May 16, 2017 at 1:16 AM, Shawn Steele via Unicode
> wrote:
> > I’m not sure how the discussion of “which is better” relates to the
> > discussion of ill-formed UTF-8 at all.
>
> Clearly, the "which
On 16 May 2017, at 08:22, Asmus Freytag via Unicode wrote:
> I therefore think that Henri has a point when he's concerned about tacit
> assumptions favoring one memory representation over another, but I think the
> way he raises this point is needlessly antagonistic.
That would be true if the
On 15 May 2017, at 23:43, Richard Wordingham via Unicode
wrote:
>
> The problem with surrogates is inadequate testing. They're sufficiently
> rare for many users that it may be a long time before an error is
> discovered. It's not always obvious that code is designed for UCS-2
> rather than UT
On Tue, May 16, 2017 at 9:50 AM, Henri Sivonen wrote:
> Consider https://hsivonen.com/test/moz/broken-utf-8.html . A quick
> test with three major browsers that use UTF-16 internally and have
> independent (of each other) implementations of UTF-8 decoding
> (Firefox, Edge and Chrome) shows agreeme
On 5/15/2017 11:50 PM, Henri Sivonen via Unicode wrote:
On Tue, May 16, 2017 at 1:16 AM, Shawn Steele via Unicode
wrote:
I’m not sure how the discussion of “which is better” relates to the
discussion of ill-formed UTF-8 at all.
Clearly, the "which is better" issue is distracting from the
under
On 15 May 2017, at 23:16, Shawn Steele via Unicode wrote:
>
> I’m not sure how the discussion of “which is better” relates to the
> discussion of ill-formed UTF-8 at all.
It doesn’t, which is a point I made in my original reply to Henry. The only
reason I answered his anti-UTF-16 rant at all
On Tue, May 16, 2017 at 6:23 AM, Karl Williamson
wrote:
> On 05/15/2017 04:21 AM, Henri Sivonen via Unicode wrote:
>>
>> In reference to:
>> http://www.unicode.org/L2/L2017/17168-utf-8-recommend.pdf
>>
>> I think Unicode should not adopt the proposed change.
>>
>> The proposal is to make ICU's spe
On Tue, May 16, 2017 at 1:16 AM, Shawn Steele via Unicode
wrote:
> I’m not sure how the discussion of “which is better” relates to the
> discussion of ill-formed UTF-8 at all.
Clearly, the "which is better" issue is distracting from the
underlying issue. I'll clarify what I meant on that point an
On 05/15/2017 04:21 AM, Henri Sivonen via Unicode wrote:
In reference to:
http://www.unicode.org/L2/L2017/17168-utf-8-recommend.pdf
I think Unicode should not adopt the proposed change.
The proposal is to make ICU's spec violation conforming. I think there
is both a technical and a political re
Softwares designed with only UCS-2 and not real UTF-16 support are still
used today
For example MySQL with its broken "UTF-8" encoding which in fact encodes
supplementary characters as two separate 16-bit code-units for surrogates,
each one blindly encoded as 3-byte sequences which would be ill-fo
2017-05-15 19:54 GMT+02:00 Asmus Freytag via Unicode :
> I think this political reason should be taken very seriously. There are
> already too many instances where ICU can be seen "driving" the development
> of property and algorithms.
>
> Those involved in the ICU project may not see the problem,
On Mon, 15 May 2017 21:38:26 +
David Starner via Unicode wrote:
> > and the fact is that handling surrogates (which is what proponents
> > of UTF-8 or UCS-4 usually focus on) is no more complicated than
> > handling combining characters, which you have to do anyway.
> Not necessarily; you ca
boun...@unicode.org] On Behalf Of David Starner
via Unicode
Sent: Monday, May 15, 2017 2:38 PM
To: unicode@unicode.org
Subject: Re: Feedback on the proposal to change U+FFFD generation when decoding
ill-formed UTF-8
On Mon, May 15, 2017 at 8:41 AM Alastair Houghton via Unicode
mailto:unicode@unicode
On Mon, May 15, 2017 at 8:41 AM Alastair Houghton via Unicode <
unicode@unicode.org> wrote:
> Yes, UTF-8 is more efficient for primarily ASCII text, but that is not the
> case for other situations
UTF-8 is clearly more efficient space-wise that includes more ASCII
characters than characters betw
On 5/15/2017 11:33 AM, Henri Sivonen via Unicode wrote:
ICU uses UTF-16 as its in-memory Unicode representation, so ICU isn't
representative of implementation concerns of implementations that use
UTF-8 as their in-memory Unicode representation.
Even though there are notable systems (Win32, Jav
>> Disagree. An over-long UTF-8 sequence is clearly a single error. Emitting
>> multiple errors there makes no sense.
>
> Changing a specification as fundamental as this is something that should not
> be undertaken lightly.
IMO, the only think that can be agreed upon is that "something's bad
On Mon, May 15, 2017 at 6:37 PM, Alastair Houghton
wrote:
> On 15 May 2017, at 11:21, Henri Sivonen via Unicode
> wrote:
>>
>> In reference to:
>> http://www.unicode.org/L2/L2017/17168-utf-8-recommend.pdf
>>
>> I think Unicode should not adopt the proposed change.
>
> Disagree. An over-long UTF
On 15 May 2017, at 18:52, Asmus Freytag wrote:
>
> On 5/15/2017 8:37 AM, Alastair Houghton via Unicode wrote:
>> On 15 May 2017, at 11:21, Henri Sivonen via Unicode
>> wrote:
>>> In reference to:
>>> http://www.unicode.org/L2/L2017/17168-utf-8-recommend.pdf
>>>
>>> I think Unicode should not a
On 5/15/2017 3:21 AM, Henri Sivonen via Unicode wrote:
Second, the political reason:
Now that ICU is a Unicode Consortium project, I think the Unicode
Consortium should be particular sensitive to biases arising from being
both the source of the spec and the source of a popular
implementation. It
On 5/15/2017 8:37 AM, Alastair Houghton via Unicode wrote:
On 15 May 2017, at 11:21, Henri Sivonen via Unicode wrote:
In reference to:
http://www.unicode.org/L2/L2017/17168-utf-8-recommend.pdf
I think Unicode should not adopt the proposed change.
Disagree. An over-long UTF-8 sequence is clea
On 15 May 2017, at 11:21, Henri Sivonen via Unicode wrote:
>
> In reference to:
> http://www.unicode.org/L2/L2017/17168-utf-8-recommend.pdf
>
> I think Unicode should not adopt the proposed change.
Disagree. An over-long UTF-8 sequence is clearly a single error. Emitting
multiple errors ther
In reference to:
http://www.unicode.org/L2/L2017/17168-utf-8-recommend.pdf
I think Unicode should not adopt the proposed change.
The proposal is to make ICU's spec violation conforming. I think there
is both a technical and a political reason why the proposal is a bad
idea.
First, the technical
101 - 146 of 146 matches
Mail list logo