On 01/11/2016 03:42 PM, Karl Williamson wrote:
It appears that
http://www.unicode.org/Public/8.0.0/ucd/auxiliary/LineBreakTest.txt is
testing a tailoring rather than the default line break algorithm,
contrary to its heading "# Default Line Break Test". And
http://www.unicode.org/Public/UCD/latest/ucd/auxiliary/LineBreakTest.html
follows
along.
For example, the default algorithm as shown in
http://www.unicode.org/reports/tr14/#Table2 follows LB25, which is an
approximation of the desired behavior. But the test and html don't
follow this. I suspect they are looking for the tailoring described in
http://www.unicode.org/reports/tr14/#Examples example 7.
For example, the test file tests for, and the html says that a class CL
code point followed by a class PO one is an unconditional line break
opportunity, based on rule 999. (which is the same as LB31 in TR14)
Whereas, http://www.unicode.org/reports/tr14/#Table2 says that a class
CL code point followed by a class PO one is an
"indirect break opportunity B % A is equivalent to B × A and B
SP+ ÷ A; in other words, do not break before A, unless one or more
spaces follow B." This is by LB25 and LB18.
There is a discrepancy here, which could be resolved either by changing
the tests and html to follow LB25, or documenting that these are for
something above and beyond the default algorithm. (There may also be
other discrepancies that I haven't stumbled against)
Ooops. I didn't see this statement in the html file:
"The Line Break tests use tailoring of numbers described in Example 7 of
Section 8.2 Examples of Customization. They also differ from the results
produced by a pair table implementation in sequences like: ZW SP CL."
This explains everything. Please disregard the earlier email from me.