OK, there's a lot here and my head is swimming, so let me re-consolidate and re-state (BTW: thanks Jon, you've really helped me understand, here).
1) The spec is somewhat vague, but the proposal that I made for single characters is not an unreasonable interpretation of what's there. Thus, we could adopt the script/major cat/minor cat triplet as the core tool that .succ will use for single, non-combining, non-modifying, valid characters? 2) The spec doesn't put this information anywhere near the definition of the range operator. Perhaps we can make a note? This was a source of confusion for me. 3) It seems that there are two competing multi-character approaches and both seem somewhat valid. Should we use a pragma to toggle behavior between A and B: A: "aa" .. "bb" contains "az" B: "aa" .. "bb" contains ONLY "aa", "ab", "ba" and "bb" 4) About the ranges I gave as examples, you asked: "Which codepoint is invalid, and why?" There's just an undefined codepoint smack in the middle of the Greek uppercase letters (U+03A2). I'm sure the Unicode specs have a rationale for that somewhere, but my guess is that there's some thousand-year-old debate about the Greek alphabet behind it. "In both of these cases, what do you think it should produce?" I actually gave that answer a bit later on. I think that "Ā" .. "Ē" should produce ĀĂĄĆĈĊČĎĐĒ and オ .. ヺ should produce オカガキギクグケゲコゴサザシジスズセゼソゾタダチヂツヅテデトドナニヌネノハバパヒビピフブプヘベペホボポマミムメモヤユヨラリルレロワヰヱヲンヴヷヸヹヺ which are all of the Katakana syllabic characters. "I also have to wonder how or if "0" ... "z" ought to be resolved. If you're thinking in terms of the alphabet or digits, this is nonsensical" Well, since you agreed with my statement about the properties checking, it would be 0 through 9 and then a through z because 0 through 9 are Latin numbers, matching the LHS's properties and a through z are lowercase Latin letters, matching the RHS's properties. For reference, this is the relevant section of the spec: Character positions are incremented within their natural range for any Unicode range that is deemed to represent the digits 0..9 or that is deemed to be a complete cyclical alphabet for (one case of) a (Unicode) script. Only scripts that represent their alphabet in codepoints that form a cycle independent of other alphabets may be so used. (This specification defers to the users of such a script for determining the proper cycle of letters.) We arbitrarily define the ASCII alphabet not to intersect with other scripts that make use of characters in that range, but alphabets that intersperse ASCII letters are not allowed. I'm not sure that all of that tracks with the Unicode standard's use of some of the terms, but based on what we've discussed, perhaps we could get more specific there: Character positions are incremented within their Unicode Script, but only in keeping with their General Category property. Thus C<"A"++> yields C<"B"> which is the next codepoint, but C<"Ă"++> yields C<"Ą"> even though "ą" falls between the two, when incrementing codepoints. Should this prove problematic for any specific Unicode Script which requires special handling (e.g. because a "letter" really isn't used as a letter at all), such special handling may be applied, but the above is the general rule. and then in the section on ranges: As discussed previously, incrementing a character (which is to say, invoking C<.succ>) seeks the next codepoint with the same Unicode Script and General Category properties (major and minor category to be specific). For ranges, succession is the same if .min and .max have the same properties, but if they do not, then all codepoints are considered which are greater than C<.min> and smaller than C<.max> and which agree with either the properties of C<.min> I<or> the properties of C<.max>