RE: Generating U+FFFD when there's no content between ISO-2022-JP escape sequences

2020-08-17 Thread Shawn Steele via Unicode
IMO, encodings, particularly ones depending on state such as this, may have multiple ways to output the same, or similar, sequences. When means that pretty much any time an encoding transforms data any previous security or other validation style checks are no longer valid and any

RE: Egyptian Hieroglyph Man with a Laptop

2020-02-13 Thread Shawn Steele via Unicode
I'm not opposed to a sub-bloc for "Modern Hieroglyphs" I confess that even though I know nothing about Hieroglyphs, that I find it fascinating that such a thoroughly dead script might still be living in some way, even if it's only a little bit. -Shawn -Original Message- From:

RE: Egyptian Hieroglyph Man with a Laptop

2020-02-12 Thread Shawn Steele via Unicode
> From the point of view of Unicode, it is simpler: If the character is in use > or have had use, it should be included somehow. That bar, to me, seems too low. Many things are only used briefly or in a private context that doesn't really require encoding. The hieroglyphs discussion is

RE: Unicode "no-op" Character?

2019-07-03 Thread Shawn Steele via Unicode
I think you're overstating my concern :) I meant that those things tend to be particular to a certain context and often aren't interesting for interchange. A text editor might find it convenient to place word boundaries in the middle of something another part of the system thinks is a single

RE: Unicode "no-op" Character?

2019-06-23 Thread Shawn Steele via Unicode
But... it's not actually discardable. The hypothetical "packet" architecture (using the term architecture somewhat loosely) needed the information being tunneled in by this character. If it was actually discardable, then the "noop" character wouldn't be required as it would be discarded.

RE: Unicode "no-op" Character?

2019-06-22 Thread Shawn Steele via Unicode
Assuming you were using any of those characters as "markup", how would you know when they were intentionally in the string and not part of your marking system? -Original Message- From: Unicode On Behalf Of Richard Wordingham via Unicode Sent: Saturday, June 22, 2019 4:17 PM To:

RE: Unicode "no-op" Character?

2019-06-22 Thread Shawn Steele via Unicode
+ the list. For some reason the list's reply header is confusing. From: Shawn Steele Sent: Saturday, June 22, 2019 4:55 PM To: Sławomir Osipiuk Subject: RE: Unicode "no-op" Character? The original comment about putting it between the base character and the combining diacritic seems peculiar.

RE: Unicode "no-op" Character?

2019-06-21 Thread Shawn Steele via Unicode
I'm curious what you'd use it for? From: Unicode On Behalf Of Slawomir Osipiuk via Unicode Sent: Friday, June 21, 2019 5:14 PM To: unicode@unicode.org Subject: Unicode "no-op" Character? Does Unicode include a character that does nothing at all? I'm talking about something that can be used

RE: NNBSP

2019-01-18 Thread Shawn Steele via Unicode
>> If they are obsolete apps, they don’t use CLDR / ICU, as these are designed >> for up-to-date and fully localized apps. So one hassle is off the table. Windows uses CLDR/ICU. Obsolete apps run on Windows. That statement is a little narrowminded. >> I didn’t look into these date

RE: NNBSP

2019-01-18 Thread Shawn Steele via Unicode
>> Keeping these applications outdated has no other benefit than providing a >> handy lobbying tool against support of NNBSP. I believe you’ll find that there are some French banks and other institutions that depend on such obsolete applications (unfortunately). Additionally, I believe you’ll

RE: NNBSP

2019-01-18 Thread Shawn Steele via Unicode
I've been lurking on this thread a little. This discussion has gone “all over the place”, however I’d like to point out that part of the reason NBSP has been used for thousands separators is because that it exists in all of those legacy codepages that were mentioned predating Unicode. Whether

RE: Why so much emoji nonsense? - Proscription

2018-02-15 Thread Shawn Steele via Unicode
b 2018 21:38:19 + Shawn Steele via Unicode <unicode@unicode.org> wrote: > I realize "I'd've" isn't > "right", Where did that proscription come from? Is it perhaps a perversion of the proscription of "I'd of"? Richard.

RE: Why so much emoji nonsense?

2018-02-15 Thread Shawn Steele via Unicode
For voice we certainly get clues about the speaker's intent from their tone. That tone can change the meaning of the same written word quite a bit. There is no need for video to wildly change the meaning of two different readings of the exact same words. Writers have always taken liberties

RE: Feedback on the proposal to change U+FFFD generation when decoding ill-formed UTF-8

2017-06-01 Thread Shawn Steele via Unicode
@unicode.org Subject: Re: Feedback on the proposal to change U+FFFD generation when decoding ill-formed UTF-8 On 6/1/2017 10:41 AM, Shawn Steele via Unicode wrote: I think that the (or a) key problem is that the current "best practice" is treated as "SHOULD" in RFC parlance. W

RE: Feedback on the proposal to change U+FFFD generation when decoding ill-formed UTF-8

2017-06-01 Thread Shawn Steele via Unicode
soft.com> Subject: Re: Feedback on the proposal to change U+FFFD generation when decoding ill-formed UTF-8 On 1 Jun 2017, at 10:32, Henri Sivonen via Unicode <unicode@unicode.org> wrote: > > On Wed, May 31, 2017 at 10:42 PM, Shawn Steele via Unicode > <unicode@unicode.org>

RE: Feedback on the proposal to change U+FFFD generation when decoding ill-formed UTF-8

2017-05-31 Thread Shawn Steele via Unicode
> And *that* is what the specification says. The whole problem here is that > someone elevated > one choice to the status of “best practice”, and it’s a choice that some of > us don’t think *should* > be considered best practice. > Perhaps “best practice” should simply be altered to say that

RE: Feedback on the proposal to change U+FFFD generation when decoding ill-formed UTF-8

2017-05-31 Thread Shawn Steele via Unicode
> it’s more meaningful for whoever sees the output to see a single U+FFFD > representing > the illegally encoded NUL that it is to see two U+FFFDs, one for an invalid > lead byte and > then another for an “unexpected” trailing byte. I disagree. It may be more meaningful for some applications

RE: Feedback on the proposal to change U+FFFD generation when decoding ill-formed UTF-8

2017-05-31 Thread Shawn Steele via Unicode
> For implementations that emit FFFD while handling text conversion and repair > (ie, converting ill-formed > UTF-8 to well-formed), it is best for interoperability if they get the same > results, so that indices within the > resulting strings are consistent across implementations for all the

RE: Feedback on the proposal to change U+FFFD generation when decoding ill-formed UTF-8

2017-05-31 Thread Shawn Steele via Unicode
> > In either case, the bad characters are garbage, so neither approach is > > "better" - except that one or the other may be more conducive to the > > requirements of the particular API/application. > There's a potential issue with input methods that indirectly edit the backing > store. For

RE: Feedback on the proposal to change U+FFFD generation when decoding ill-formed UTF-8

2017-05-30 Thread Shawn Steele via Unicode
> Until TUS 3.1, it was legal for UTF-8 parsers to treat the sequence > as U+002F. Sort of, maybe. It was not legal for them to generate it though. So you could kind of infer that it was not a legal sequence. -Shawn

RE: Feedback on the proposal to change U+FFFD generation when decoding ill-formed UTF-8

2017-05-30 Thread Shawn Steele via Unicode
> Which is to completely reverse the current recommendation in Unicode 9.0. > While I agree that this might help you fending off a bug report, it would > create chances for bug reports for Ruby, Python3, many if not all Web > browsers,... & Windows & .Net Changing the behavior of the Windows

RE: Feedback on the proposal to change U+FFFD generation when decoding ill-formed UTF-8

2017-05-30 Thread Shawn Steele via Unicode
> I think nobody is debating that this is *one way* to do things, and that some > code does it. Except that they sort of are. The premise is that the "old language was wrong", and the "new language is right." The reason we know the old language was wrong was that there was a bug filed

RE: Feedback on the proposal to change U+FFFD generation when decoding ill-formed UTF-8

2017-05-26 Thread Shawn Steele via Unicode
So basically this came about because code got bugged for not following the "recommendation." To fix that, the recommendation will be changed. However then that is going to lead to bugs for other existing code that does not follow the new recommendation. I totally get the forward/backward

RE: Feedback on the proposal to change U+FFFD generation when decoding ill-formed UTF-8

2017-05-23 Thread Shawn Steele via Unicode
> If the thread has made one thing clear is that there's no consensus in the > wider community > that one approach is obviously better. When it comes to ill-formed sequences, > all bets are off. > Simple as that. > Adding a "recommendation" this late in the game is just bad standards policy. I

RE: Feedback on the proposal to change U+FFFD generation when decoding ill-formed UTF-8

2017-05-23 Thread Shawn Steele via Unicode
+ the list, which somehow my reply seems to have lost. > I may have missed something, but I think nobody actually proposed to change > the recommendations into requirements No thanks, that would be a breaking change for some implementations (like mine) and force them to become non-complying or

RE: Feedback on the proposal to change U+FFFD generation when decoding ill-formed UTF-8

2017-05-16 Thread Shawn Steele via Unicode
> Faster ok, privided this does not break other uses, notably for random > access within strings… Either way, this is a “recommendation”. I don’t see how that can provide for not-“breaking other uses.” If it’s internal, you can do what you will, so if you need the 1:1 seeming parity, then

RE: Feedback on the proposal to change U+FFFD generation when decoding ill-formed UTF-8

2017-05-16 Thread Shawn Steele via Unicode
But why change a recommendation just because it “feels like”. As you said, it’s just a recommendation, so if that really annoyed someone, they could do something else (eg: they could use a single FFFD). If the recommendation is truly that meaningless or arbitrary, then we just get into silly

RE: Feedback on the proposal to change U+FFFD generation when decoding ill-formed UTF-8

2017-05-16 Thread Shawn Steele via Unicode
to:unicode-boun...@unicode.org] On Behalf Of Richard Wordingham via Unicode Sent: Tuesday, May 16, 2017 10:58 AM To: unicode@unicode.org Subject: Re: Feedback on the proposal to change U+FFFD generation when decoding ill-formed UTF-8 On Tue, 16 May 2017 17:30:01 +0000 Shawn Steele via Unicod

RE: Feedback on the proposal to change U+FFFD generation when decoding ill-formed UTF-8

2017-05-16 Thread Shawn Steele via Unicode
> Would you advocate replacing > e0 80 80 > with > U+FFFD U+FFFD U+FFFD (1) > rather than > U+FFFD (2) > It’s pretty clear what the intent of the encoder was there, I’d say, and > while we certainly don’t > want to decode it as a NUL (that was the source of

RE: Feedback on the proposal to change U+FFFD generation when decoding ill-formed UTF-8

2017-05-15 Thread Shawn Steele via Unicode
>> Disagree. An over-long UTF-8 sequence is clearly a single error. Emitting >> multiple errors there makes no sense. > > Changing a specification as fundamental as this is something that should not > be undertaken lightly. IMO, the only think that can be agreed upon is that "something's bad