|
These kinds of regexes are being
developed in various contexts.
For example, there's a group developing regexes for Indic scripts for use with CSS. That effort focuses on the syllable, not least because concepts like "first-letter" used in CSS are not relevant to those scripts. Then there is ICANN's root zone label generation rules project. That effort focuses on labels, not words. The difference is that it is acceptable for rules on labels to be slightly more restrictive (that is allow them to underproduce) while on the other hand, labels are not limited to actual words, so the rules overproduce. For Unicode's purposes, such overproduction is not necessarily harmful, because ordinary text can contain words as well as domain names. However, any underproduction would have to be remedied, no matter how complex it would make the rule system. As a matter of practical experience, being involved in the ICANN project mentioned, we've discovered that breaking up these rules makes them easier to understand and handle. We are finding that the majority of rules can be expressed as left-context for a given character ( example: X must be preceded by any of ... ). We find that right contexts or dual context are less often needed; however, there are some constructs that occur in syllable final or word-final position only, modeling those contexts is more complex. However, the main motivation for having context rules as part of label generation rules is to prevent characters from occurring in contexts where rendering engines may not be able to deal with them, or, alternatively, to eliminate potential alternate orderings that are intended to mean the same thing. We find that for those two purposes, trying to model the full syllable is practically never required. A./ On 1/10/2017 1:11 AM, Mark Davis ☕️ wrote:
|
- Specification of Encoding of Plain Text Richard Wordingham
- Re: Specification of Encoding of Plain Text Asmus Freytag
- Re: Specification of Encoding of Plain Text Mark Davis ☕️
- Re: Specification of Encoding of Plain Tex... Asmus Freytag
- Re: Specification of Encoding of Plain Tex... Richard Wordingham
- Re: Specification of Encoding of Plain... Mark Davis ☕️
- Re: Specification of Encoding of ... Richard Wordingham
- Re: Specification of Encoding... Mark Davis ☕️
- Re: Specification of Enco... Richard Wordingham
- Re: Specification of Enco... Mark Davis ☕️
- Re: Specification of Enco... Richard Wordingham
- Re: Specification of Enco... Mark Davis ☕️
- Re: Specification of Enco... Richard Wordingham
- Re: Specification of Enco... Asmus Freytag

