The incorrect presence of break opportunities at SOT in LineBreakTest.txt is a 
known issue, documented in the erratum dated 2008-April-28 at 
http://www.unicode.org/errata/.  The correct result at SOT is a no-break, in 
accordance to rule LB2.

Regards,
L.

From: unicode-bou...@unicode.org [mailto:unicode-bou...@unicode.org] On Behalf 
Of Konstantin Ritt
Sent: Tuesday, June 08, 2010 8:10 AM
To: Mark Davis ☕
Cc: Asmus Freytag; Masaaki Shibata; unicode@unicode.org
Subject: Re: Questionable lines on LineBreakTest.txt

2010/6/8 Mark Davis ☕ <m...@macchiato.com<mailto:m...@macchiato.com>>

> If the test files are "known to be in error", then those "known" cases need 
> to be actually communicated back to the UTC; sitting on them doesn't do 
> anyone any good.
>
> I have not had a chance to investigate, but this particular case may be 
> covered by the description in 
> http://unicode.org/Public/6.0.0/ucd/auxiliary/LineBreakTest-6.0.0d4.html:
>
> The Line Break tests use tailoring of numbers described in Example 7 of 
> Section 8.2 Examples of Customization.


indeed.
LB24 says: The default line breaking algorithm approximates this with the 
following (LB25) rule. Note that some cases have already been handled, such as 
‘9,’, ‘[9’. For a tailoring that supports the regular expression directly, as 
well as a key to the notation see Section 8.2, Examples of Customization.

and there is a note in LineBreakTest*.txt file: Note: The Line Break tests use 
tailoring of numbers described in Example 7 of Section 8.2 Examples of 
Customization. They also differ from the results produced by a pair table 
implementation in sequences like: ZW SP CL.


but I have yet another question: why every test in LineBreakTest.txt assumes 
break opportunity at the start-of-text while LB2 says: Never break at the start 
of text ? if these tests are for "out of context" usage, where can i read such 
note?

Konstantin

Reply via email to