The purpose of the Pattern Syntax characters is *not* to list everything that is
a symbol or punctuation mark. That exists independently. Think of them as
operators in the engine syntax, as "?" or "*" are used today in Perl, or as
+, -, /, * could be used in math expressions.

The goal is to have a relatively small, unchangeable list of ranges, which
contain a reasonable restriction on characters for future syntax characters in a
general pattern environment. General regular expression engines, for example,
would *not* add 05C3 HEBREW PUNCTUATION SOF PASUQ as an operator, to indicate
(say) a non-greedy match variant of *.

Mark
__________________________________
http://www.macchiato.com
►  “Eppur si muove” ◄

----- Original Message ----- 
From: "Peter Kirk" <[EMAIL PROTECTED]>
To: "Marco Cimarosti" <[EMAIL PROTECTED]>
Cc: <[EMAIL PROTECTED]>
Sent: Friday, August 22, 2003 07:45
Subject: Re: Proposed Draft UTR #31 - Syntax Characters


> On 22/08/2003 06:04, Marco Cimarosti wrote:
>
> >Rick McGowan wrote:
> >
> >
> >>the process as possible so that it can be considered
> >>The draft is found at http://www.unicode.org/reports/tr31/
> >>and feedback can be submitted as described there.
> >>
> >>
> >
> >(Before submitting official feedback, I'd like to discuss my comments here.
> >BTW, which "Type of Message" should I use in the feedback form? Is it OK to
> >use "Technical Report or Tech Note issues"?)
> >
> >
> >My two cents are both about adding characters in the <Pattern_Syntax> of
> >"4.1 Proposed Pattern Properties".
> >
> >IMHO:
> >
> > 1. Full-width, half-width, and "small" punctuation characters should
> >in class <Pattern_Syntax> as their "normal width" counterparts.
> >
> > 2. Non-Latin punctuation character should be in class
> ><Pattern_Syntax> as their Latin counterparts.
> >
> >...
> >
> >Should any of the above character be added to <Pattern_Syntax> (i.e. *not*
> >allowed in identifiers)?
> >
> >_ Marco
> >
> >
> >
> >
> >
> >
> We should include
>
> 05C3 HEBREW PUNCTUATION SOF PASUQ
>
> as this is similar in appearance, at least in many Hebrew fonts, and
> function to a colon. Also if the ordinary Latin hyphen, quotation mark,
> vertical line etc are included, so should be
>
> 05BE HEBREW PUNCTUATION MAQAF
> 05C0 HEBREW PUNCTUATION PASEQ
> 05F3 HEBREW PUNCTUATION GERESH
> 05F4 HEBREW PUNCTUATION GERSHAYIM
>
> and equivalents in Armenian, Syriac etc etc. Indeed why not include
> everything with punctuation properties? According to tr31 "some
> script-specific characters were removed". Why? What remains is also
> script-specific, but just for Latin script.
>
> -- 
> Peter Kirk
> [EMAIL PROTECTED] (personal)
> [EMAIL PROTECTED] (work)
> http://www.qaya.org/
>
>
>
>


Reply via email to