Re: Make Encode.pm support the real UTF-8

Tim Bunce Sat, 04 Dec 2004 10:26:04 -0800

On Sat, Dec 04, 2004 at 01:40:30PM +0900, Dan Kogai wrote:
> On Dec 04, 2004, at 11:51, Larry Wall wrote:
> >On Fri, Dec 03, 2004 at 10:12:12PM +0000, Tim Bunce wrote:
> >: I've no problem with 'utf8' being perl's unrestricted uft8 encoding,
> >: but "UTF-8" is the name of the standard and should give the
> >: corresponding behaviour.
> >
> >For what it's worth, that's how I've always kept them straight in my 
> >head.
> >
> >Also for what it's worth, Perl 6 will mostly default to strict but make
> >it easy to switch back to lax.
> >
> >Larry
> 
> Okay, Looks like the verdict is reached.
> 
> 1.  "utf8" will stay liberal
> 2.  "UTF-8" will be strict
> 
> The rest is mostly implemenation.
> 
> 2.1.  What will the canonnical name of the strict version of "UTF-8" be 
> ? Gisle already submitted me a test patch and it uses 'utf-8-strict'.  
> If there is no objection, I would like to use that.


"UTF-8" is the name of the standard and should give the corresponding
behaviour.  Why not use "UTF-8" as the canonnical name of the
behaviour that matches the "UTF-8" standard? Strictness should be
implied by the fact it's the official name of the encoding.

> 2.2.  CAVEAT: "UTF8" will be "utf8", not "utf-8-strict", since Encode 
> aliasing is case insensitive.
> 
> 2.3.  Degree of stricture. How strict are we going to make utf-8-strict?
>    a. simply make use of UTF8_ALLOW_* in utf8.h ?
>    b. unmapped codepoints banned as well?
>    IMHO a. is strict enough since mapped codepoints are subject to 
> increase
>    as Unicode Standard updates.

Overlong sequences (ie security) are the only concern I have.

Tim.

> 2.4   We can always make "UTF-8" liberal by reapplying alias.
> 
> Anything else missing?
> 
> Dan the Encode Maintainer
>

Re: Make Encode.pm support the real UTF-8

Reply via email to