Metacharacters in character classes

Carl Mäsak Thu, 26 Mar 2009 07:50:34 -0700

It started by yours truly asking impertinent questions on #perl6...

 <http://irclog.perlgeek.de/perl6/2009-03-26#i_1018345>


...and ended with a general feeling that the way metacharacters and
backwhacking work in <[ ]> character classes, is at worst inconsistent
and at best underspecified by S05.

Specifically, the following paragraphs from that spec do _not_ hold
for character classes, which are more like a sublanguage of their own:

] Unlike traditional regular expressions, Perl 6 does not require
] you to memorize an arbitrary list of metacharacters.  Instead it
] classifies characters by a simple rule.  All glyphs (graphemes)
] whose base characters are either the underscore (C<_>) or have
] a Unicode classification beginning with 'L' (i.e. letters) or 'N'
] (i.e. numbers) are always literal (i.e. self-matching) in regexes. They
] must be escaped with a C<\> to make them metasyntactic (in which
] case that single alphanumeric character is itself metasyntactic,
] but any immediately following alphanumeric character is not).
]
] All other glyphs--including whitespace--are exactly the opposite:
] they are always considered metasyntactic (i.e. non-self-matching) and
] must be escaped or quoted to make them literal.  As is traditional,
] they may be individually escaped with C<\>, but in Perl 6 they may
] be also quoted as follows.

In character classes, most 'other glyphs' mean themselves, just like
alphanumerics, with a few notable exceptions: backslash (\), closing
bracket (]) and dash (-) and whitespace still need to be backwhacked.
All other characters are treated literally, including dot (.) which is
actually used for metasyntactic purposes in character classes. In
other words, currently /<[.]>/ is legal, but /<[-]>/ is not. Which is
kinda weird, if you ask me.

See the linked #perl6 log for details.

What's the big-picture rule of thumb regarding metacharacters in
character classes?

// Carl

Metacharacters in character classes

Reply via email to