# New Ticket Created by Zefram # Please include the string: [perl #128550] # in the subject line of all future correspondence about this issue. # <URL: https://rt.perl.org/Ticket/Display.html?id=128550 >
Built-in character classes such as <lower> consistently accept any diacritics on a matching base character, matching the whole grapheme: > /^<lower>$/.ACCEPTS("u\x[308]").Bool True > /^<lower>$/.ACCEPTS("n\x[308]").Bool True Matching against a literal character or a <[abc]>-type enumerated character class consistently rejects any diacritics on a matching base character: > /^<[nu]>$/.ACCEPTS("u\x[308]").Bool False > /^<[nu]>$/.ACCEPTS("n\x[308]").Bool False But a <[a..z]>-type range-based character class has inconsistent behaviour: > /^<[a..z]>$/.ACCEPTS("u\x[308]").Bool False > /^<[a..z]>$/.ACCEPTS("n\x[308]").Bool True The behaviour seems to be that if in NFC the first character of the grapheme is the unadorned base character then it accepts, but if it's a combined character then it rejects. This dependence on the representation breaks the grapheme view of the string, and so is presumably a bug. I think a <[a..z]>-type range should, with respect to diacritics, behave either like <lower> or like <[abc]>. I am unable to discern which is really intended; none of the documentation that I've seen addresses grapheme semantics. I note that matching a specified base character with arbitrary diacritics is a meaningful facility, and given that <lower> et al have that behaviour it should probably be available somewhere. The character range feature is almost providing it, but it's obviously not been designed to, because a single-character range such as <[n..n]> is rejected. -zefram