Tom Christiansen wrote:
Patrick wrote:

: > * Almost. E.g. isL would be nice to have as well.
:
: Those exist also:
:
:  $ ./perl6
:  > say 'abCD34' ~~ / <isL> /
:  a
:  > say 'abCD34' ~~ / <isN> /
:  3
:  >

They may exist, but I'm not certain it's a good idea to encourage
the Is_XXX approach on *anything* except Script=XXX properties.
They certainly don't work on everything, you know.

Also, I can't for the life of me why one would ever write <isL> when
<Letter> is so much more obvious; similarly, for <isN> over <Number>. Just because you can do so, doesn't mean you necessarily should.

    http://unicode.org/reports/tr18/#Categories

    The recommended names for UCD properties and property values are in
    PropertyAliases.txt [Prop] and PropertyValueAliases.txt [PropValue].
    There are both abbreviated names and longer, more descriptive names.

    It is strongly recommended that both names be recognized, and that
    loose matching of property names be used, whereby the case
    distinctions, whitespace, hyphens, and underbar are ignored.

Furthermore, be aware that the Number property is *NOT* the same
as the Decimal_Number property.  In perl5, if one wants [0-9], then
one expresses it exactly that way, since that's a lot shorter than
writing (?=\p{ASCII})\p{Nd}, where Nd can also be Decimal_Number.

Again, please that Number is far broader than even Decimal_Number,
which is itself almost certainly broader than you're thinking.

Here's a trio of little programs specifically designed to help scout
out Unicode characters and their properties.  They work best on 5.12+,
but should be ok on 5.10, too.

--tom


The 'Is' prefix can be used on any property in 5.12 for which there is no naming conflict. The only naming conflicts are certain of the block properties, such as Arabic. IsArabic means the Arabic script. InArabic means the base Arabic block. Personally, I find Is and In unintuitive, and prefer to write sc=arabic or blk=arabic instead.

When Unicode proposed to add some properties in 5.2 that started with 'Is', there was significant enough protest that they backed off, and promised never to do it again, adding a stability policy to 6.0 to that effect. Apparently a number of languages use 'Is' as a prefix.

Reply via email to