When UAX9 mentions a paragraph level, it says: > Paragraphs are divided by the Paragraph Separator or appropriate Newline Function (for guidelines on the handling of CR, LF, and CRLF, see *Section 4.4, Directionality*, and *Section 5.8, Newline Guidelines* of [Unicode <http://www.unicode.org/reports/tr41/tr41-15.html#Unicode>]). Paragraphs may also be determined by higher-level protocols: for example, the text in two different cells of a table will be in different paragraphs.
Regards, Konstantin 2015-02-21 3:56 GMT+04:00 Philippe Verdy <[email protected]>: > 2015-02-20 6:14 GMT+01:00 Richard Wordingham < > [email protected]>: > >> TUS has a whole section on the issue, namely TUS 7.0.0 Section 5.8. >> One thing that is missing is mention of the convention that a single >> newline character (or CRLF pair) is a line break whereas a doubled >> newline character denotes a paragraph break. >> > > In that case CR or LF characters alone are not "paragraph separators" by > themselves unless they are grouped together. Like NEL, they should just be > considered as line separators and the terminology used in UAX 29 rule SB4 > is effectively incorrect if what matters here is just the linebreak > property. And also in that case, the SB4 rule should effecticely include > NEL (from the C1 subset). > > But as SB4 is only related to sentence breaking, It would be e problem > because simple linebreaks are used extremely frequently in the middle of > sentences. > > What the Sentence break algorithm should say is that there should first be > a preprossing step separating line breaks and paragraph breaks (creating > custom entities,(similar to collation elements, but encoded internally with > a code point out of the standard space), that the rule SB4 would use > instead of "Sep | CR | LF". That custome entity should be "Sep" but without > the rule defining it, as there are various ways to represent paragraph > breaks. > > > _______________________________________________ > Unicode mailing list > [email protected] > http://unicode.org/mailman/listinfo/unicode > >
_______________________________________________ Unicode mailing list [email protected] http://unicode.org/mailman/listinfo/unicode

