On 4 Jan 2017, at 23:40, Markus Scherer <[email protected]> wrote:
> 
> On Wed, Jan 4, 2017 at 2:28 AM, Alastair Houghton 
> <[email protected]> wrote:
> RFC 5893 seems pretty clear to me, and the problem really is that the test 
> vectors (which come from unicode.org) seem (to me) to be incorrect.
> 
> https://tools.ietf.org/html/rfc5893#section-2 says "The following rule, 
> consisting of six conditions, applies to labels in Bidi domain names."
> 
> That's what the ICU code does -- applying the rule to each label -- and I 
> assume that's the basis for the test data.

Absolutely.  But the crucial part is “in Bidi domain names”.  That is, it 
applies to *all* labels that are part of a Bidi domain name, not just RTL 
labels.  It did not say “applies to RTL labels in Bidi domain names” and in 
fact even explicitly states that (in the first bullet point at the end of 
section 2):

  ...Note that even LTR labels and pure ASCII labels have to be tested.

Not to mention the fact that parts 5 and 6 of the rule apply specifically to 
LTR labels.

So it’s quite clear that given the domain name “0à.א”, both “א” *and* “0à” need 
to be checked using the Bidi Rule.  Unless someone can explain why “0à” does 
not fail the test, surely we all agree that line 74 is incorrect:

> B;    0à.\u05D0;      ;       xn--0-sfa.xn--4db       #       0à.א

and similarly with line 93:

> B;    àˇ.\u05D0;      ;       xn--0ca88g.xn--4db      #       àˇ.א

> ICU does not currently check for multi-label bidi combinations.

I was a bit puzzled by this, because the code clearly does (both in the C++ and 
Java versions) and yet the online demo doesn’t appear to object to the above 
test cases.  So I wrote a quick test program against the C++ version of ICU 
58.2 and fed it both test cases, and, sure enough, ICU agrees that there is a 
BiDi error in both of the above cases.

Kind regards,

Alastair.

--
http://alastairs-place.net


Reply via email to