Re: More character matching bits

Jarkko Hietaniemi Mon, 11 Jun 2001 13:20:57 -0700

On Mon, Jun 11, 2001 at 01:05:43PM -0700, Russ Allbery wrote:
> Dan Sugalski <[EMAIL PROTECTED]> writes:
> 
> > Should perl's regexes and other character comparison bits have an option
> > to consider different characters for the same thing as identical beasts? 
> > I'm thinking in particular of the Katakana/Hiragana bits of japanese,
> > but other languages may have the same concepts.
> 
> I think canonicalization gets you that if that's what you want.  I
> definitely think that Perl should be able to do all of NFD, NFC, NFKD, and
> NFKC canonicalization.

Agreed.

> NFC will collapse most different characters for the same thing to a single
> character and get rid of most of the compatibility characters for you.
> NFKC will go further and do stuff like getting rid of superscripts and the
> like.

-- 
$jhi++; # http://www.iki.fi/jhi/
        # There is this special biologist word we use for 'stable'.
        # It is 'dead'. -- Jack Cohen

Re: More character matching bits

Reply via email to