Re: about P1 part of BIDI alogrithm

Martin J. Dürst Tue, 11 Oct 2011 03:08:03 -0700

Hello Eli,

There is absolutely no problem to treat the algorithm in UAX#9 as a setof requirements, and come up with a totally different implementationthat produces the same results. I think actually UAX#9 says so somewhere.

But what is, strictly speaking, not allowed is to change therequirements. One requirement of the algorithm is that when lines arebroken, logically earlier characters stay on earlier lines, andlogically later characters move to later lines.

In this respect, your implementation doesn't conform to UAX#9. There'san external reason for this, and an internal one. The external reason isthat continuation lines in Emacs are in general just an overflow device,text in Emacs isn't supposed to be broken into lines in the same way ase.g. word processors break lines to form paragraphs. I'm not sure howmuch it is true (line breaks often e.g. interfere with formatting inJapanese and other languages that don't use spaces between words anddon't work well with a convention of "convert a line break in the sourceto a space in the output"), but I think to some extent it is true.

The internal reason is the one you describe below. It may indeed be astrong reason from an implementation perspective, but from an userperspective, it's a very weak reason. Also, I don't understand it fully.You say that the Emacs display engine examines each character in turn.Assuming these are in logical order, you would just examine them up tothe point where you have "about one line" of glyphs. There would indeedbe a bit of back and forth there because of the interaction between bidialgorithm and glyph selection (but as far as I know, mirrored glyphsmostly have the same width as their originals). Anyway, that bit of backand forth seems to be much less of a problem than the back and forththat you get when you have to reorder over much larger distances becauseyou're essentially considering a whole paragraph as a single line. ButI'm not an expert in Emacs display engine details, so I can't say for sure.


Regards,   Martin.

On 2011/10/11 16:43, Eli Zaretskii wrote:

Date: Tue, 11 Oct 2011 10:53:39 +0900
From: "Martin J. Dürst"<[email protected]>
CC: li bo<[email protected]>, [email protected]

I might add here that 'break a line' in the Bidi algorithm is done
before actual reordering (which is done line-by-line), but after
calculating all the levels.


Please be aware that this separation of the UBA into phases makes no
sense at all in the context of Emacs display engine.  The UBA is
written from the POV of batch processing of a block of text -- you
pass in a string in logical order, and receive a reordered string in
return.  The UBA describes the processing as a series of phases, each
one of which is completed for all the characters in the block of text
before the next phase begins.

By contrast, the Emacs display engine examines the text to display one
character at a time.  For each character, it loads the necessary
display and typeface information, and then decides whether it will fit
the display line.  Then it examines the next character, and so on.  It
should be clear that processing characters one by one completely
disrupts the subdivision of the UBA into the phases that include
examination of more than that single character, let alone decisions of
where to break the line, because reordering can no longer be done
"line by line".

Let me give you just one example: if the character should be mirrored,
you cannot decide whether it fits the display line until _after_ you
know what its mirrored glyph looks like.  But mirroring is only
resolved at a very late stage of reordering, so if you want to reorder
_after_ breaking into display lines, you will have to back up and
reconsider that decision after reordering, which will slow you down.

Given these considerations, it is a small wonder that the UBA
implementation inside Emacs is _very_ different from the description
in UAX#9.  Therefore, the subdivision into phases that are on the line
and higher levels makes very little sense here, since the
implementation needed to produce an identical result while performing
a significant surgery on the algorithm description.  In effect, the
UBA implementation in Emacs treated UAX#9 as a set of requirements,
not as a high-level description of the implementation.

Re: about P1 part of BIDI alogrithm

Reply via email to