Re: IDNA2008 Contextual rules clarification

Kenneth Whistler Fri, 29 Oct 2010 14:46:52 -0700

Nagesh Chigurupati asked:

> I have a question regarding some of the contextual rules in RFC5892. For
> example the contextual rule in appendix A.4 Greek Lower Numeral Sign
> (U+0375), states the following:
> 
> If Script(After(cp)) .eq.  Greek Then True;
> 
> If the Greek Lower Numeral Sign (U+0375) is the last code point in the
> IDN, should it be allowed? There are statements in the RFC5892 as
> follows:
> 
> Before(FirstChar) evaluates to Undefined.
> After(LastChar) evaluates to Undefined.
> 
> Can I assume that "Undefined" is not equal to "Greek", and therefore
> input sequences with a trailing Greek Lower Numeral Sign are always
> disallowed by the specification?


Correct.
 
> The Hebrew Punctuation Geresh (U+05F3), Hebrew Puncutation Gershayim
> (U+05F4), etc. also pose a similar question. The rule set for these
> contextual rules states the following:
> 
> If Script(Before(cp)) .eq.  Hebrew Then True;
> 
> So, if the first code point is U+05F3, then should it be disallowed 

Correct.

> as
> there is no code point before this one to assert that it belongs to the
> Hebrew script.

Although the reasoning there is incorrect. The script of
U+05F3 and U+05F4 is Hebrew already. It isn't a matter of a lack
of a previous character to assert this. Rather, the RFC 5892
specification simply states that U+05F3 and U+05F4
are only allowed immediately following a(nother) Hebrew character
in a label.

--Ken

Re: IDNA2008 Contextual rules clarification

Reply via email to