On Wed, Jul 07, 2004 at 08:09:51PM -0700, Larry Wall wrote:
: On Tue, Jun 29, 2004 at 10:52:34AM -0500, Jonathan Scott Duff wrote:
: : On Tue, Jun 29, 2004 at 08:34:16AM -0700, Austin Hastings wrote:
: : > This has no direct bearing on p6l, since performance is a p6i issue.
: : > But perhaps in the interests of performance as well as hackery we
: : > should explicitly provide some sort of variant regex behavior:
: : > 
: : >     /a./ :bytes
: : >     /a./ :graphemes
: : > 
: : > where the first would recognize 0x61 followed by any single byte, while
: : > the second would recognize 'a' followed by any number of bytes
: : > composing a single grapheme.
: : 
: : Isn't that what :u0, :u1, :u2, and :u3 are for?
: : 
: :         :u0         # use bytes       (. is byte)
: :         :u1         # level 1 support (. is codepoint)
: :         :u2         # level 1 support (. is grapheme)
: :         :u3         # level 1 support (. is language dependent)
: 
: These modifiers might get renamed to match whatever b/c/g/w convention
: we come up with pragmas.  The levels aren't all that intuitive, though
: there is a kind of progression of semantic complexity that would get
: lost with ordinary names.

On the flip side, a good reason to get rid of the numeric values is
that in all likelihood people will continually make the mistake of
thinking :u1 means "one byte at a time" and :u2 means "two bytes at
a time".  And then they'll wonder why :u4 doesn't give them UTF-32...

Larry

Reply via email to