2015-02-20 6:14 GMT+01:00 Richard Wordingham < [email protected]>:
> TUS has a whole section on the issue, namely TUS 7.0.0 Section 5.8. > One thing that is missing is mention of the convention that a single > newline character (or CRLF pair) is a line break whereas a doubled > newline character denotes a paragraph break. > In that case CR or LF characters alone are not "paragraph separators" by themselves unless they are grouped together. Like NEL, they should just be considered as line separators and the terminology used in UAX 29 rule SB4 is effectively incorrect if what matters here is just the linebreak property. And also in that case, the SB4 rule should effecticely include NEL (from the C1 subset). But as SB4 is only related to sentence breaking, It would be e problem because simple linebreaks are used extremely frequently in the middle of sentences. What the Sentence break algorithm should say is that there should first be a preprossing step separating line breaks and paragraph breaks (creating custom entities,(similar to collation elements, but encoded internally with a code point out of the standard space), that the rule SB4 would use instead of "Sep | CR | LF". That custome entity should be "Sep" but without the rule defining it, as there are various ways to represent paragraph breaks.
_______________________________________________ Unicode mailing list [email protected] http://unicode.org/mailman/listinfo/unicode

