> From: Egmont Koblinger <egm...@gmail.com> > Date: Tue, 5 Feb 2019 00:08:10 +0100 > Cc: unicode@unicode.org > > every single newline character starts a new paragraph. The result of > printf "Hello\nWorld\n" > world.txt > is a text file consisting of two paragraphs, with 5 characters in each. > Correct?
Yes. > > Actually, Emacs implements the rule that paragraphs are separated by > > empty lines. This is documented in the Emacs manuals. > > That is, Emacs overrides UAX#9 and comes up with a different > definition? Yes, Emacs uses the "higher-level protocols" clause in HL1, when the paragraph direction is to be determined from the text. (There's also a way for the user or a Lisp program to force a certain base paragraph direction on all paragraphs in a window that displays some text.) > Furthermore, you argue that in terminals I should follow > Emacs's definition rather than Unicode's? IME, what Emacs uses gives much better results, yes. > I believe I understand your concerns with the per-line paragraph > definition, but this interpretation that I've just shown most likely > leads to even more broken behavior. I don't see how the result could be more broken, when the decisions about base paragraph direction are made much more rarely. The places in text where the paragraph direction will be determined under my proposal is a small subset of the places where it will be determined by the default UBA rules. So it will make the same mistakes as the each-line-is-a-new-paragraph method, but there will be much fewer of such mistakes. In addition to this theoretical argument, I have 10 years of using this in Emacs to back me up. The only difference between Emacs and your example is the very first paragraph. > It's a really nontrivial technical problem to let the terminal > emulator know where each prompt, and/or each command's output begins > and ends. There's work going on for letting the terminal emulator > recognize the prompts, but even if it's successful, it'll probably > take 5-10 years to reach the majority of the users. And it probably > still wouldn't solve the case of knowing the boundary between the two > outputs if a "cat file1.txt; cat file2.txt" is executed, let alone if > they're concatenated with "cat file1.txt file2.txt". I think you are trying to find a perfect solution, and because it probably doesn't exist, or at least is hard to come by, you conclude that a solution that is imperfect should be rejected. But I'm not saying my proposal is the perfect solution, just that it is better (sometimes, way better) than the default of considering each line a paragraph. > So, what you're arguing for, is that the default behavior should be > something that's: > - currently not implementable in a semantically correct way (to stop > around shell prompts) due to technical limitations, and > - isn't what Unicode says. The first point has to do with the search for a perfect solution. My advice is to settle for something reasonable even if it is not perfect. The second point is incorrect: the UBA explicitly allows the implementation to apply higher-level protocols for paragraph direction, see HL1 in UAX#9. > You have not convinced me that the pros outweigh the cons. There are no cons in my proposal that aren't already present in the default each-line-is-a-new-paragraph rule. So even if the pros don't outweigh the cons, the balance should be better than under the default. > That being said, I'm more than open to see such a behavior as a > future extension, subject of course to the semantic prompt stuff > being available. I think the default should provide reasonably good display, and each-line-is-a-new-paragraph doesn't.