Re: UAX #29: Ambiguities in WB4, and contributing back testcases

2016-12-22 Thread Daniel Bünzli
On Thursday 22 December 2016 at 23:05, Manish Goregaokar wrote: > I guess the confusion is, with → rules, do we apply them globally, or > only apply them when considering subsequent rules? This was discussed recently. See [1]. Best, Daniel [1]

Re: UAX #29: Ambiguities in WB4, and contributing back testcases

2016-12-22 Thread Richard Wordingham
On Thu, 22 Dec 2016 14:05:18 -0800 Manish Goregaokar wrote: > I guess the confusion is, with → rules, do we apply them globally, or > only apply them when considering subsequent rules? I would say the latter. The logic is that you apply the whole set of rules on either side

Re: UAX #29: Ambiguities in WB4, and contributing back testcases

2016-12-22 Thread Manish Goregaokar
> Why don't you have the same problem when you determine word breaks in CR > Extend LF? By rule WB4, we don't break between CR and Extend, and treat the CRxExtend aggregate as CR, and that in turn doesn't break with LF by WB3. The rule states that we "treat whatever is on the left side (X

Re: UAX #29: Ambiguities in WB4, and contributing back testcases

2016-12-22 Thread Richard Wordingham
On Wed, 21 Dec 2016 15:24:21 -0800 Manish Goregaokar wrote: > Aside from that, WB4's[6] greediness is underspecified. In previous > versions, the rule was > However, now the rule is > > > X (Extend | Format | ZWJ)* → X > > The problem here is that ZWJ appears in the

UAX #29: Ambiguities in WB4, and contributing back testcases

2016-12-21 Thread Manish Goregaokar
Hi, We've been implementing[1] the Unicode 9 version of UAX #29[2] in Rust, and came across some ambiguities and issues. One issue is that the tests[3] are a bit lacking. They don't handle cases with multiple flag emoji, for example (the handling of which changed since Unicode 8). We have a