Hi Eli, > Actually, UAX#9 defines "paragraph" as the chunk of text delimited by > paragraph separator characters. This means characters whose bidi > category is B, which includes Newline, the CR-LF pair on Windows, > U+0085 NEL, and U+2029 PARAGRAPH SEPARATOR.
Indeed, this was an oversight on my side. So, with this definition, every single newline character starts a new paragraph. The result of printf "Hello\nWorld\n" > world.txt is a text file consisting of two paragraphs, with 5 characters in each. Correct? > Actually, Emacs implements the rule that paragraphs are separated by > empty lines. This is documented in the Emacs manuals. That is, Emacs overrides UAX#9 and comes up with a different definition? Furthermore, you argue that in terminals I should follow Emacs's definition rather than Unicode's? Or please clarify if I misunderstood you here. > > while Emacs itself is a viewer that treats runs between single > > newlines as paragraphs. That is, Emacs is inconsistent with itself. > > Incorrect. Emacs always treats a run of text between empty lines as a > single paragraph, in TUTORIAL.he and everywhere else. There's nothing > special about TUTORIAL.he, it is just a plain text file with a few > dozen of bidi formatting controls, needed to show the key sequences > with weak and neutral characters in correct visual order. [...] Thanks for the clarification, I believe it's clear to me now. > At least with Emacs, it is not the same. I think considering each > line as a separate paragraph makes writing bidi plain-text documents > that look right almost impossible, if each line ends in a newline [...] > My personal recommendation is to adopt theempty line rule. It's > simple enough and gives good results IME. [...] > I'm surprised that you describe this as such a complex problem. I > think you explained up-thread that terminal emulators should cope with > lines of text arriving piecemeal, which I interpreted as meaning that > text is stored in the emulator's memory. Modern emulators running on > windowed desktops also provide scroll-back buffers, and react to > expose events. So I think the text that is currently in the viewport, > and also some text previously shown, are stored in memory, and can be > consulted. The problem is not the memory management. Let's look at the following session: ---snip--- prompt$ cat file1.txt This is the first human-perceived paragraph. And this is the second. prompt$ cat file2.txt Here this is the third paragraph. And this one is the fourth. prompt$ ---snip--- If you load the files to Emacs, it is perfectly aware of the contents of the two files. It can define paragraphs however it wants to, and BiDi the files accordingly. The terminal emulator doesn't know what's a shell prompt, what's a command that the user types, what's the output of that command. (You don't know either from this snippet. Maybe I only cat'ed file1.txt, and "prompt$ cat file2.txt" is just the sixth line of this eleven-line file.) In the terminal emulator's eyes, with Emacs's definition (empty line delimited), this is one paragraph: prompt$ cat file1.txt This is the first human-perceived paragraph. and this is another paragraph: And this is the second prompt$ cat file2.txt Here this is the third paragraph. and similarly for the third one. I believe I understand your concerns with the per-line paragraph definition, but this interpretation that I've just shown most likely leads to even more broken behavior. It's a really nontrivial technical problem to let the terminal emulator know where each prompt, and/or each command's output begins and ends. There's work going on for letting the terminal emulator recognize the prompts, but even if it's successful, it'll probably take 5-10 years to reach the majority of the users. And it probably still wouldn't solve the case of knowing the boundary between the two outputs if a "cat file1.txt; cat file2.txt" is executed, let alone if they're concatenated with "cat file1.txt file2.txt". So, what you're arguing for, is that the default behavior should be something that's: - currently not implementable in a semantically correct way (to stop around shell prompts) due to technical limitations, and - isn't what Unicode says. You have not convinced me that the pros outweigh the cons. That being said, I'm more than open to see such a behavior as a future extension, subject of course to the semantic prompt stuff being available. cheers, egmont

