Re: basic-hebrew RtL-space ?

Philippe Verdy Mon, 01 Nov 2004 13:41:15 -0800

From: "kefas" <[EMAIL PROTECTED]>

Inserting unicode/basic-hebrew reults in a convinient
RtL, right-to-left, advance of the cursor, but the
space-character jumps to the far right.  Is there a
RtL-space?
In MS-Word and OpenOffice I can only change whole
paragraphs to RtL-entry.  But quoting just  a few
words in hebrew WITHIN a paragraph would be helpful to
many.

And this is what the embedding controls are made for: - surround an RTL subtext (Hebrew, Arabic...) within LTR paragraphs (Latin...) with a RLE/PDF pair. - souround an LTR subtext (Latin, ...) within RTL paragraphs (Hebrew, ...) with a LRE/PDF pair.

There's no need of a separate RTL space, given that the regular ASCII SPACE (U+0020) character is used within all RTL texts as the standard default word separator, and it inherits it has a weak directionality, that does not force a direction break, but that his inherited from the surrounding text.

A good question however is whever the space should inherit its direction from the previous ctext or the next one. - If the previous text has a strong directionality, then the space should inherit its direction. This should be the case everytime you are entering text with a space at end: it's very disturbing to see this new space shift on the opposite side, when entering some space-sparated hebrew words within a Latin text, because the editor assumes that no more Hebrew will be added on the same line (this causes surprizing editing errors, for example when creating a translation resource file where translated resources are prefixed by an ASCII key, for example when editing a .po file for GNU programs using gettext()). - If the previous text in the same paragraph has no directionality, then it inherits its direction from the text after it (if it has a strong directionality); - if this does not work then a global context for the whole text should be used, or alternatively the directionality of the end of the previous paragraph (this influences where the cursor would go to align such weakly-directed paragraph with the previous paragraph, including the default start margin position.)

The regular Bidi algorithm should be used to render a complete text, but strict Bidi rules should not be obeyed everytime when composing a text, where the current cursor position should act as a sentence break with a strong inherited directionality: the text can then be redirected at this position when the cursor moves to other parts of the text.

I don't think this is an issue of renderers but of editors (notably in Notepad, where you won't know exactly where to enter a space during edition, unless you use the contextual menu that allows switching the global default directionality, and swap the alignment to the side margins; sometimes, when you want to know where there are REL/RLE and PDF Bidi controls, it's nearly impossible to determine it vizually in Notepad, unless you use an external tool such as native2ascii, from the Java SDK, to change the encoding with clearly visible marks). It's unfortunate, given that Notepad (since Windows XP) offers you a directly accessible contextual menu to enter Bidi controls and change the global direction and alignment to side margins. (But notepad has a "visible controls" editing mode, to solve such ambiguities.)

Related: The other Hebrew characters in the alphabetic
presentation forms insert themselves in LtR-fashion?
Why this difference?
I read about Logical and Visual entry, but don't see
how that answers my 2 questions above.

Visual entry should never be used. It was used for some legacy encodings to render text on devices that don't implement the Bidi algorithm and can only render text as LTR. Nobody enters RTL text in "pseudo-visual" LTR order; only the logical input order is needed.

But don't mix the input order and the encoding order as they can be different (it should not if the text is converted and stored in Unicode, where only the logical order is legal for any mix of Latin, Greek, Cyrillic, and Hebrew, Arabic).

The case for Thai is different because its input order is (historically) visual rather than logical, and then the text is encoded using the same (visual) order. This is not changed with Thai in Unicode, to keep its compatibility with the national Thai standard TIS-620 (and further revizions). So even though Thai uses an non-logical order, its input order and encoding order is the same.

The difference of encoding orders is known mainly for historic texts created for modern Hebrew, and more rarely Arabic, or for texts encoded in a private pre-press encoding used to prepare the global layout of pages (these texts are more easily and fast processed in complex page layouts if they are prepared in visual order before flowing them in the page layout template; such applications use specific encodings in a richer rendering context than just plain text, so this is out of scope of the Unicode standard itself).

Re: basic-hebrew RtL-space ?

Reply via email to