RE: Unicode editing (RE: Unicode complaints)

Marco Cimarosti Tue, 20 Mar 2001 02:13:42 -0800
Roozbeh Pournader wrote:
> Exactly what I was talking about :)

Of course it is your idea. I just thought about it in the last week-end :)

> I'm not sure about the times two
> embeddings occur exactly adjacent to each other. I have a sense that
> merging the two may have bad effects.

I am not sure that I catch what you mean here.

My simplified view was that each visual segment of text (i.e. one or more
adjacent characters at the same level) should have the opposite
directionality than the two segments around it.

The checking rules that I had in mind to ensure this were:

1) The left-most and right-most characters in each line must be either level
0 or 1;

2) The characters adjacent (in visual order) to each character must be
either the same level, one level higher, or one lever lower.

But there are cases when one actually needs two visually adjacent segments
of the same directionality:

Visual: she said i need water DIAS EH
Levels: 00000000022222222222211111111
Logic:  she said <RLE>HE SAID i need water<PDF>

The adjacent levels 0 and 2 would be against my scheme, but no doubt they
are necessary. So my first idea was to add a zero-width odd-level character
(represented by "*" below) between the two adjacent even-level characters:

Visual: she said *i need water DIAS EH
Levels: 000000000122222222222211111111
Logic:  (same as above)

But, as a second thought, this might not be really necessary.

The rules could be relaxed this way:

1) The left-most *or* right-most character in each line (or both) must be
either level 0 or 1;

2) At least one of the characters adjacent (in visual order) to each
character must be either the same level, one level higher, or one lever
lower.

3) Rule two does not necessarily apply to the left-most *or* right-most
character in each line.

> > Multiple embedding level would thus be visualized with a 
> downwards stack of
> > arrows. All arrows must be shorter that the arrow that 
> precedes it, and
> > point in the opposite direction.
> 
> That seems great! Anyone who has the time to implement the idea?

I have a vague memory that I have seen arrows of that kind in the chapter
about the bidirectional algorithm on the Unicode book. Now, I don't remember
whether they were printed or drawn with a pencil...

> But I like them to appear as:
>                            <--   <--
>                         -------------->
> the specific things getting nearer to the specified.

Yes, of course! Why did I do it the other way round?

This is also consistent with expressions like "higher level" and "lower
level".

> > I would say that two separate commands are needed to edit 
> the levels:
> > 
> > - "Bidi Embed": adds or subtracts *two* to the to the 
> embedding level of the
> > selected text.
> > - "Bidi Override": adds or subtract *one* to the embedding 
> level of the
> > selected text.
> 
> ... or the text that will follow. Something like the font selection
> mechanisms in rich text editors.

Yes, I had this in mind only for "Bidi Override". But now that you say it, I
see no reason for not extending it to the embedding.

But this should be forgotten as soon as you move the cursor away (which is
also true for font selection, BTW).

> >     Visual order:   abcDEFghiJKLmno
> > These are of course very simple examples. The algorithm gets much
> > complicated by validity checks, such as ensuring that the 
> user doesn't do
> > non-sense embeddings (like LTR text embedded in other LTR text).
> 
> ..., which may happen when our ser deletes some text. (Or, 
> she may want the effect; what are these 65 levels for?)

I wonder... Weren't the good old 16 levels of 2.0 enough?

> > Moreover, complications certainly arise with cut&paste: I 
> think that the
> > levels must be adjusted to avoid non-sense situations.
> 
> Also, in a good desktop environment, the clipboard should support this
> "text with embedding levels"; that's different from text with bidi
> controls.
> Or, at a second thought, because some applications may be 
> unaware of this
> new text type, the application should put bidi controls in 
> the text when
> giving in to the clipboard, so good bidi editors can take 
> that and compute
> the levels before pasting the text, and bad (!) bidi editors 
> can treat the
> text without bothering themselves.

I don't see this as a problem. System clipboards are quite sophisticated
these days, and they can contain multiple formats for the same data.

E.g., if I copy text from my browser and paste it in my word processor, all
the formatting data (fonts, colors, etc.) are retained. But if I paste it in
my plain-text editor, they get stripped off and I am left with plain text.

This could also work with our story. If you copy text from a "WYSIWYG
Unicode" and paste it to another, the visual data is transmitted (along with
its resolved levels applied to each character).

But if you paste it to another kind of Unicode application, then "proper
Unicode" is generated on-the-fly, with its logical order, embedding
controls, etc.

Similarly when the text travels the other way round.

Just, when you select text the lowest level in the selection is arbitrary
(e.g., 27 or 46) I think that this lowest level should be adjusted to 0 or
1, and all the other levels adjusted to maintain the same difference with
the lowest level.

Similarly when you paste: the levels in the inserted string must be
increased to match the embedding context where the string is being pasted.

I also had another thought: keyboard entry could work exactly as clipboard
pasting. I mean that there is actually no reason for having directionally
neutral characters. All keys could become mini-clipboards: a character (or
sequence) already tagged with embedding levels (0 or 1).


_ Marco
RE: Unicode editing (RE: Unicode complaints)

Reply via email to