On Tue, 5 Feb 2019 00:08:10 +0100 Egmont Koblinger via Unicode <unicode@unicode.org> wrote:
> Hi Eli, > > > Actually, UAX#9 defines "paragraph" as the chunk of text delimited > > by paragraph separator characters. This means characters whose bidi > > category is B, which includes Newline, the CR-LF pair on Windows, > > U+0085 NEL, and U+2029 PARAGRAPH SEPARATOR. It actually gives two different definitions. Table UAX#9 4 restricts the type B to *appropriate newline functions; not all newlines are paragraph separators. > Indeed, this was an oversight on my side. So, with this definition, > every single newline character starts a new paragraph. The result of > printf "Hello\nWorld\n" > world.txt > is a text file consisting of two paragraphs, with 5 characters in > each. Correct? No, it depends on when a newline function is 'appropriate'. TUS 5.8 Rule R2b applies - 'In simple text editors, interpret any NLF the same as LS'. > > Actually, Emacs implements the rule that paragraphs are separated by > > empty lines. This is documented in the Emacs manuals. > > That is, Emacs overrides UAX#9 and comes up with a different > definition? Furthermore, you argue that in terminals I should follow > Emacs's definition rather than Unicode's? Or please clarify if I > misunderstood you here. He's deriving 'B' from a protocol. Richard.