Re: Regular Expression Quick Reference

Tom Christiansen Sun, 27 Jul 2003 08:18:43 -0700

>+   lc          Lowercase a string
>+   lcfirst     Lowercase first char of a string
>+   uc          Uppercase a string
>+   ucfirst     Uppercase first char of a string


Not quite; the last one (for ucfirst or \u) should be Titlecase, 
not Uppercase--which of course, are not always the same.

Consider the "dz" character at U+01F3:

    % perl  -e 'printf "U+%04X\n",  ord          chr 0x1F3'
    U+01F3
    % perl  -e 'printf "U+%04X\n",  ord uc       chr 0x1F3'
    U+01F1
    % perl  -e 'printf "U+%04X\n",  ord ucfirst  chr 0x1F3'
    U+01F2

If you're (usefully) running something like

    % xterm -n unicode -u8 -fn 
-misc-fixed-medium-r-normal--20-200-75-75-c-100-iso10646-1 

Then under perl v5.8.1, providing that you've used -C6 or setenv
PERL_UNICODE to 6 or some such similarly useful value, then you can look 
at actual characters on your screen instead of their numeric codepoints.

    % perl -le 'print chr for 0x1f1 .. 0x1f3'
    DZ
    Dz
    dz

If that's too exotic, consider the "ß" character at U+00DF (more 
common in Germany than in Hungary, unlike "dz"):

    % perl -le 'print           pack "U",        0xDF'
    ß
    % perl -le 'print uc        pack "U",        0xDF'
    SS
    % perl -le 'print ucfirst   pack "U",        0xDF'
    Ss

The funny pack is to force the UTF8 flag when 128 < codepoint <= 256.  
That way we get correct casing rules loaded for what would otherwise
presumably appear to be in an 8-bit encoding, since for this peculiar
character, the POSIX and/or ctype.h charclass macros are of no use, 
but the Unicode casing rules are.

    % perl -le 'print           0xDF'
    ß
    % perl -le 'print uc        0xDF'
    ß
    % perl -le 'print ucfirst   0xDF'
    ß

Alas.

--tom

Re: Regular Expression Quick Reference

Reply via email to