Re: Line wrapping of mixed LTR/RTL text

2018-08-28 Thread Eli Zaretskii via Unicode
> From: Cosmin Apreutesei 
> Date: Tue, 28 Aug 2018 21:28:58 +0300
> Cc: unicode@unicode.org
> 
> > That is not so if the line ends after the whitespace: in that case the
> > whitespace is trailing, and will appear at the visual end of the
> > line.
> 
> So only if it's a soft break I should indeed remove the last logical
> space, if it's before a hard break then leave it alone.

Actually, you don't have to remove it, you could leave it.  It's only
an aesthetic issue.

> > No, it should show the space after ABC to the left of ABC,
> > i.e. immediately before the line end.
> 
> Just to make sure, this moving of the last space at the visual end of
> the line can only be experienced with a moving cursor, right? I mean
> as far as displaying goes (and as far as line width computation for
> the purposes of line wrapping goes), that space is just removed,
> right?

As I said, not necessarily.  But it is definitely there when you
reorder characters for display.

> I'm trying to infer the purpose of moving that space to the
> end of the line instead of just removing it

If you remove trailing space, then you need to see it being trailing
before you remove it.  That is the purpose of moving it.

> > What UAX#9 tells you is that you need to decide that the line will
> > wrap after the space that follows "ABC"
> 
> ... but when computing the line width I should not include the width
> of that space, right? since it will not take space in the box in the
> end.

If you will remove the space, then yes.

> You mean it will produce this:
> 
> " ABC لمفاتيح"

Yes.


Re: Line wrapping of mixed LTR/RTL text

2018-08-28 Thread Cosmin Apreutesei via Unicode
Hi Philippe,

> The space encoded just before the logical end of line or linewrap (in the 
> middle of the displayed line) has to be moved at end of the physical line (in 
> the paragraph direction), it should not be kept in the middle.

Ok, that seem to confirm what Eli is saying and it clarifies that
sentence from UAX#9. Thanks!


Re: Line wrapping of mixed LTR/RTL text

2018-08-28 Thread Cosmin Apreutesei via Unicode
Hi Eli, thanks for answering! I think I'm getting closer. Just a few
more clarifications if you please.

> That is not so if the line ends after the whitespace: in that case the
> whitespace is trailing, and will appear at the visual end of the
> line.

So only if it's a soft break I should indeed remove the last logical
space, if it's before a hard break then leave it alone.

> Only if you add some character after the whitespace will the
> whitespace "jump" to the other side of the word.

... because the hard break just turned into a soft break and the newly
typed character will appear on the next line with a hard line break
after it, right?

> No, it should show the space after ABC to the left of ABC,
> i.e. immediately before the line end.

Just to make sure, this moving of the last space at the visual end of
the line can only be experienced with a moving cursor, right? I mean
as far as displaying goes (and as far as line width computation for
the purposes of line wrapping goes), that space is just removed,
right?  I'm trying to infer the purpose of moving that space to the
end of the line instead of just removing it: is the idea to always
provide a cursor at the visual end of the line so that typing can
continue there or is there more to it?

> What UAX#9 tells you is that you need to decide that the line will
> wrap after the space that follows "ABC"

... but when computing the line width I should not include the width
of that space, right? since it will not take space in the box in the
end.

>, then reorder the line as if it
> ended after that space, which will produce this:
>
> لمفاتيح ABC
>
> (with the trailing space to the left of "ABC").  Then you should
> display "DEF" on the next line.

You mean it will produce this:

" ABC لمفاتيح"



Re: Line wrapping of mixed LTR/RTL text

2018-08-28 Thread Philippe Verdy via Unicode
The space encoded just before the logical end of line or linewrap (in the
middle of the displayed line) has to be moved at end of the physical line
(in the paragraph direction), it should not be kept in the middle.

If you need to force a linewrap on a non-breaking space (because there's no
other break opportunity to wrap the line elsewhere), then treat that
non-breaking space as a regular breaking space which will also be moved at
end of the row (after the margin on the ending side of the paragraph), and
choose the last non-breaking space on the row; usually, all spaces present
at linewraps (including non-breaking spaces) are compacted. But there are
other style policies that will force the linewrap preferably after a
trailing punctuation or a separator punctuation, or before a leading
punctuation, or just after the last unbreakable cluster that can fit the
row (including ion the middle of words at arbitrary position if there's no
hyphenation process or the script does not support hyphenation, such as
sinograms and kanas).

Where to insert linewraps is very fuzzy and depends on the rendering
context and capabilities of the target device (you cannot scroll a piece of
printed paper, but you can scroll a display with a scrollbar or using
navigation cursors in a width-restricted input field)

Le mar. 28 août 2018 à 16:34, Cosmin Apreutesei via Unicode <
unicode@unicode.org> a écrit :

> Hello everyone,
>
> I'm having a bit of trouble implementing line wrapping with bidi and I
> would like to ask for some advice or hints on what is the proper way
> to do this.
>
> UAX#9 section 3.4 says that bidi reordering should be done after line
> wrapping. But in order to do line wrapping correctly I need to be able
> to visually ignore some whitespace, and I'm not sure exactly which
> whitespace must be ignored.
>
> There is this sentence in UAX#9 which provides a clue: "[...] trailing
> whitespace will appear at the visual end of the line (in the paragraph
> direction).". I'm not sure what that means, but by doing some tests
> with fribidi and libunibreak I noticed that the whitespace always
> sticks to the logical end of the word (so visually to the right for
> LTR runs and to the left for RTL runs), regardless of the base
> paragraph direction. Is it safe to use this assumption and always
> remove the whitespace at the logical end of the last word of the line?
> Or is it more complicated than that?
>
> Quick example showing the problem. The following text:
>
> لمفاتيح ABC DEF
>
> with RTL base direction would wrap (for a certain line width) as:
>
> ABC  لمفاتيح
> DEF
>
> with two spaces between the Latin and Arabic text, one from the Latin
> text and one from the Arabic text. Since the line logically ends with
> the "C" and LTR direction, I should have to probably remove the space
> after the "C" (and, as a rule, just remove the whitespace at the
> logical end of the word, regardless of paragraph's direction or word's
> direction). Is this the right way to do it?
>
> Screenshots attached.
>
> Thanks!
>


Re: Line wrapping of mixed LTR/RTL text

2018-08-28 Thread Eli Zaretskii via Unicode
> Date: Tue, 28 Aug 2018 13:44:58 +0300
> From: Cosmin Apreutesei via Unicode 
> 
> There is this sentence in UAX#9 which provides a clue: "[...] trailing
> whitespace will appear at the visual end of the line (in the paragraph
> direction).". I'm not sure what that means, but by doing some tests
> with fribidi and libunibreak I noticed that the whitespace always
> sticks to the logical end of the word (so visually to the right for
> LTR runs and to the left for RTL runs), regardless of the base
> paragraph direction.

That is not so if the line ends after the whitespace: in that case the
whitespace is trailing, and will appear at the visual end of the
line.  Only if you add some character after the whitespace will the
whitespace "jump" to the other side of the word.

> Quick example showing the problem. The following text:
> 
> لمفاتيح ABC DEF
> 
> with RTL base direction would wrap (for a certain line width) as:
> 
> ABC  لمفاتيح
> DEF
> 
> with two spaces between the Latin and Arabic text, one from the Latin
> text and one from the Arabic text.

No, it should show the space after ABC to the left of ABC,
i.e. immediately before the line end.

What UAX#9 tells you is that you need to decide that the line will
wrap after the space that follows "ABC", the reorder the line as if it
ended after that space, which will produce this:

لمفاتيح ABC 

(with the trailing space to the left of "ABC").  Then you should
display "DEF" on the next line.

IOW, the correct order is:

  . find levels
  . wrap in logical order
  . reorder wrapped lines