Dan Kogai <[EMAIL PROTECTED]> writes: > Sine Gisle's patch make use of utf8n_to_uvuni(), it seems to be a > problem of perl core. So I have checked utf8.c which defines that. > Seems like it does not make use of PERL_UNICODE_MAX. > > The patch against utf8.c fixes that.
Seems like a good idea to have a workaround in Encode for this as well. Index: users/gisle/hacks/Encode/Encode.xs --- Encode/Encode.xs.~1~ Mon Dec 6 10:44:31 2004 +++ Encode/Encode.xs Mon Dec 6 10:44:31 2004 @@ -300,6 +300,10 @@ UTF8_CHECK_ONLY | (strict ? UTF8_ALLOW_STRICT : UTF8_ALLOW_NONSTRICT) ); +#if 1 /* perl-5.8.6 and older do not check UTF8_ALLOW_LONG */ + if (strict && uv > PERL_UNICODE_MAX) + ulen = -1; +#endif if (ulen == -1) { if (strict) { uv = utf8n_to_uvuni(s, e - s, &ulen, End of Patch. > --- perl-5.8.x/utf8.c Wed Nov 17 23:11:04 2004 > +++ perl-5.8.x.dan/utf8.c Sun Dec 5 11:38:52 2004 > @@ -429,6 +429,13 @@ > } > else > uv = UTF8_ACCUMULATE(uv, *s); > + /* Checks if ord() > 0x10FFFF -- dankogai */ > + if (uv > PERL_UNICODE_MAX){ > + if (!(flags & UTF8_ALLOW_LONG)) { > + warning = UTF8_WARN_LONG; > + goto malformed; > + } > + } > if (!(uv > ouv)) { > /* These cannot be allowed. */ > if (uv == ouv) {