On Tue, 5 Feb 2019 00:08:10 +0100
Egmont Koblinger via Unicode <unicode@unicode.org> wrote:

> Hi Eli,
> 
> > Actually, UAX#9 defines "paragraph" as the chunk of text delimited
> > by paragraph separator characters.  This means characters whose bidi
> > category is B, which includes Newline, the CR-LF pair on Windows,
> > U+0085 NEL, and U+2029 PARAGRAPH SEPARATOR.  

It actually gives two different definitions.  Table UAX#9 4 restricts
the type B to *appropriate newline functions; not all newlines are
paragraph separators.

> Indeed, this was an oversight on my side. So, with this definition,
> every single newline character starts a new paragraph. The result of
> printf "Hello\nWorld\n" > world.txt
> is a text file consisting of two paragraphs, with 5 characters in
> each. Correct?

No, it depends on when a newline function is 'appropriate'.  TUS 5.8
Rule R2b applies - 'In simple text editors, interpret any NLF the same
as LS'.

> > Actually, Emacs implements the rule that paragraphs are separated by
> > empty lines.  This is documented in the Emacs manuals.  
> 
> That is, Emacs overrides UAX#9 and comes up with a different
> definition? Furthermore, you argue that in terminals I should follow
> Emacs's definition rather than Unicode's? Or please clarify if I
> misunderstood you here.

He's deriving 'B' from a protocol.

Richard.

Reply via email to