The purpose of the Pattern Syntax characters is *not* to list everything that is a symbol or punctuation mark. That exists independently. Think of them as operators in the engine syntax, as "?" or "*" are used today in Perl, or as +, -, /, * could be used in math expressions.
The goal is to have a relatively small, unchangeable list of ranges, which contain a reasonable restriction on characters for future syntax characters in a general pattern environment. General regular expression engines, for example, would *not* add 05C3 HEBREW PUNCTUATION SOF PASUQ as an operator, to indicate (say) a non-greedy match variant of *. Mark __________________________________ http://www.macchiato.com ► “Eppur si muove” ◄ ----- Original Message ----- From: "Peter Kirk" <[EMAIL PROTECTED]> To: "Marco Cimarosti" <[EMAIL PROTECTED]> Cc: <[EMAIL PROTECTED]> Sent: Friday, August 22, 2003 07:45 Subject: Re: Proposed Draft UTR #31 - Syntax Characters > On 22/08/2003 06:04, Marco Cimarosti wrote: > > >Rick McGowan wrote: > > > > > >>the process as possible so that it can be considered > >>The draft is found at http://www.unicode.org/reports/tr31/ > >>and feedback can be submitted as described there. > >> > >> > > > >(Before submitting official feedback, I'd like to discuss my comments here. > >BTW, which "Type of Message" should I use in the feedback form? Is it OK to > >use "Technical Report or Tech Note issues"?) > > > > > >My two cents are both about adding characters in the <Pattern_Syntax> of > >"4.1 Proposed Pattern Properties". > > > >IMHO: > > > > 1. Full-width, half-width, and "small" punctuation characters should > >in class <Pattern_Syntax> as their "normal width" counterparts. > > > > 2. Non-Latin punctuation character should be in class > ><Pattern_Syntax> as their Latin counterparts. > > > >... > > > >Should any of the above character be added to <Pattern_Syntax> (i.e. *not* > >allowed in identifiers)? > > > >_ Marco > > > > > > > > > > > > > We should include > > 05C3 HEBREW PUNCTUATION SOF PASUQ > > as this is similar in appearance, at least in many Hebrew fonts, and > function to a colon. Also if the ordinary Latin hyphen, quotation mark, > vertical line etc are included, so should be > > 05BE HEBREW PUNCTUATION MAQAF > 05C0 HEBREW PUNCTUATION PASEQ > 05F3 HEBREW PUNCTUATION GERESH > 05F4 HEBREW PUNCTUATION GERSHAYIM > > and equivalents in Armenian, Syriac etc etc. Indeed why not include > everything with punctuation properties? According to tr31 "some > script-specific characters were removed". Why? What remains is also > script-specific, but just for Latin script. > > -- > Peter Kirk > [EMAIL PROTECTED] (personal) > [EMAIL PROTECTED] (work) > http://www.qaya.org/ > > > >

