On Sat, Dec 04, 2004 at 01:40:30PM +0900, Dan Kogai wrote: > On Dec 04, 2004, at 11:51, Larry Wall wrote: > >On Fri, Dec 03, 2004 at 10:12:12PM +0000, Tim Bunce wrote: > >: I've no problem with 'utf8' being perl's unrestricted uft8 encoding, > >: but "UTF-8" is the name of the standard and should give the > >: corresponding behaviour. > > > >For what it's worth, that's how I've always kept them straight in my > >head. > > > >Also for what it's worth, Perl 6 will mostly default to strict but make > >it easy to switch back to lax. > > > >Larry > > Okay, Looks like the verdict is reached. > > 1. "utf8" will stay liberal > 2. "UTF-8" will be strict > > The rest is mostly implemenation. > > 2.1. What will the canonnical name of the strict version of "UTF-8" be > ? Gisle already submitted me a test patch and it uses 'utf-8-strict'. > If there is no objection, I would like to use that.
"UTF-8" is the name of the standard and should give the corresponding behaviour. Why not use "UTF-8" as the canonnical name of the behaviour that matches the "UTF-8" standard? Strictness should be implied by the fact it's the official name of the encoding. > 2.2. CAVEAT: "UTF8" will be "utf8", not "utf-8-strict", since Encode > aliasing is case insensitive. > > 2.3. Degree of stricture. How strict are we going to make utf-8-strict? > a. simply make use of UTF8_ALLOW_* in utf8.h ? > b. unmapped codepoints banned as well? > IMHO a. is strict enough since mapped codepoints are subject to > increase > as Unicode Standard updates. Overlong sequences (ie security) are the only concern I have. Tim. > 2.4 We can always make "UTF-8" liberal by reapplying alias. > > Anything else missing? > > Dan the Encode Maintainer >