On 21/08/2003 08:38, Mark Davis wrote:

I suspect your distinction is a bit too subtle to be useful. Having, for
example, a RLM only have affect when adjacent to a space in a regular expression
would be pretty prone to error; expecially since the character would be
invisible.

The reason for allowing LRM and RLM is to be able to make patterns readable. If
you have some syntax like
/AB?(\p{letter})*A\n.../
(where the uppercase represents Hebrew), then bidi display of the neutrals
renders the pattern almost completely illegible. Inserting LRMs or RLMs at
appropriate points straightens out the display. In a special "pattern UI", one
could override the (or some) neutrals to have a strong direction, but most
patterns are viewed and edited in plaintext editors.

My recommendation for pattern syntax would be to quote all
Default_Ignorable_Code_Points if they are actually to be part of literals.
Otherwise the maintanence of such regular expressions (or queries, or rules,
etc.) becomes quite difficult, since the DICP are invisible by default.

Mark
__________________________________
http://www.macchiato.com
►  “Eppur si muove” ◄



Understood, I think. I agree that literal default ignorables should be quoted. My concern was that in an example like yours, if RLM and LRM alone are taken as whitespace, they might be taken as terminating the whole pattern, which would defeat your purpose of allowing them to be inserted in the pattern so that it displays as required. That was why I wanted them to be ignored in the patterns. But maybe I am not understanding enough of the context of the whole syntax here.

--
Peter Kirk
[EMAIL PROTECTED] (personal)
[EMAIL PROTECTED] (work)
http://www.qaya.org/





Reply via email to