Hello Eli,

There is absolutely no problem to treat the algorithm in UAX#9 as a set of requirements, and come up with a totally different implementation that produces the same results. I think actually UAX#9 says so somewhere.

But what is, strictly speaking, not allowed is to change the requirements. One requirement of the algorithm is that when lines are broken, logically earlier characters stay on earlier lines, and logically later characters move to later lines.

In this respect, your implementation doesn't conform to UAX#9. There's an external reason for this, and an internal one. The external reason is that continuation lines in Emacs are in general just an overflow device, text in Emacs isn't supposed to be broken into lines in the same way as e.g. word processors break lines to form paragraphs. I'm not sure how much it is true (line breaks often e.g. interfere with formatting in Japanese and other languages that don't use spaces between words and don't work well with a convention of "convert a line break in the source to a space in the output"), but I think to some extent it is true.

The internal reason is the one you describe below. It may indeed be a strong reason from an implementation perspective, but from an user perspective, it's a very weak reason. Also, I don't understand it fully. You say that the Emacs display engine examines each character in turn. Assuming these are in logical order, you would just examine them up to the point where you have "about one line" of glyphs. There would indeed be a bit of back and forth there because of the interaction between bidi algorithm and glyph selection (but as far as I know, mirrored glyphs mostly have the same width as their originals). Anyway, that bit of back and forth seems to be much less of a problem than the back and forth that you get when you have to reorder over much larger distances because you're essentially considering a whole paragraph as a single line. But I'm not an expert in Emacs display engine details, so I can't say for sure.

Regards,   Martin.

On 2011/10/11 16:43, Eli Zaretskii wrote:
Date: Tue, 11 Oct 2011 10:53:39 +0900
From: "Martin J. Dürst"<[email protected]>
CC: li bo<[email protected]>, [email protected]

I might add here that 'break a line' in the Bidi algorithm is done
before actual reordering (which is done line-by-line), but after
calculating all the levels.

Please be aware that this separation of the UBA into phases makes no
sense at all in the context of Emacs display engine.  The UBA is
written from the POV of batch processing of a block of text -- you
pass in a string in logical order, and receive a reordered string in
return.  The UBA describes the processing as a series of phases, each
one of which is completed for all the characters in the block of text
before the next phase begins.

By contrast, the Emacs display engine examines the text to display one
character at a time.  For each character, it loads the necessary
display and typeface information, and then decides whether it will fit
the display line.  Then it examines the next character, and so on.  It
should be clear that processing characters one by one completely
disrupts the subdivision of the UBA into the phases that include
examination of more than that single character, let alone decisions of
where to break the line, because reordering can no longer be done
"line by line".

Let me give you just one example: if the character should be mirrored,
you cannot decide whether it fits the display line until _after_ you
know what its mirrored glyph looks like.  But mirroring is only
resolved at a very late stage of reordering, so if you want to reorder
_after_ breaking into display lines, you will have to back up and
reconsider that decision after reordering, which will slow you down.

Given these considerations, it is a small wonder that the UBA
implementation inside Emacs is _very_ different from the description
in UAX#9.  Therefore, the subdivision into phases that are on the line
and higher levels makes very little sense here, since the
implementation needed to produce an identical result while performing
a significant surgery on the algorithm description.  In effect, the
UBA implementation in Emacs treated UAX#9 as a set of requirements,
not as a high-level description of the implementation.



Reply via email to