FYI, the UTC retracted the following. *[151-C19 <http://www.unicode.org/cgi-bin/GetL2Ref.pl?151-C19>] Consensus:* Modify the section on "Best Practices for Using FFFD" in section "3.9 Encoding Forms" of TUS per the recommendation in L2/17-168 <http://www.unicode.org/cgi-bin/GetMatchingDocs.pl?L2/17-168>, for Unicode version 11.0.
Mark (https://twitter.com/mark_e_davis) On Wed, May 24, 2017 at 3:56 PM, Karl Williamson via Unicode < unicode@unicode.org> wrote: > On 05/24/2017 12:46 AM, Martin J. Dürst wrote: > >> On 2017/05/24 05:57, Karl Williamson via Unicode wrote: >> >>> On 05/23/2017 12:20 PM, Asmus Freytag (c) via Unicode wrote: >>> >> >> Adding a "recommendation" this late in the game is just bad standards >>>> policy. >>>> >>> >> Unless I misunderstand, you are missing the point. There is already a >>> recommendation listed in TUS, >>> >> >> That's indeed correct. >> >> >> and that recommendation appears to have >>> been added without much thought. >>> >> >> That's wrong. There was a public review issue with various options and >> with feedback, and the recommendation has been implemented and in use >> widely (among else, in major programming language and browsers) without >> problems for quite some time. >> > > Could you supply a reference to the PRI and its feedback? > > The recommendation in TUS 5.2 is "Replace each maximal subpart of an > ill-formed subsequence by a single U+FFFD." > > And I agree with that. And I view an overlong sequence as a maximal > ill-formed subsequence that should be replaced by a single FFFD. There's > nothing in the text of 5.2 that immediately follows that recommendation > that indicates to me that my view is incorrect. > > Perhaps my view is colored by the fact that I now maintain code that was > written to parse UTF-8 back when overlongs were still considered legal > input. An overlong was a single unit. When they became illegal, the code > still considered them a single unit. > > I can understand how someone who comes along later could say C0 can't be > followed by any continuation character that doesn't yield an overlong, > therefore C0 is a maximal subsequence. > > But I assert that my interpretation is just as valid as that one. And > perhaps more so, because of historical precedent. > > It appears to me that little thought was given to the fact that these > changes would cause overlongs to now be at least two units instead of one, > making long existing code no longer be best practice. You are effectively > saying I'm wrong about this. I thought I had been paying attention to > PRI's since the 5.x series, and I don't remember anything about this. If > you have evidence to the contrary, please give it. However, I would have > thought Markus would have dug any up and given it in his proposal. > > > >> >> There is no proposal to add a >>> recommendation "this late in the game". >>> >> >> True. The proposal isn't for an addition, it's for a change. The "late in >> the game" however, still applies. >> >> Regards, Martin. >> >> > >