Re: Line breaking and Hyphenation

2002-07-04 Thread Keiron Liddle


These (and other) problems are precisely why certain areas have been
redesigned.
Wouldn't it be better to put the effort into getting the new code to
work?

On Wed, 2002-07-03 at 10:57, Joerg Pietschmann wrote:
 I put a StringBuffer into FObjMixed to accumulate
 consecutive addCharacters() events.

This is probably a good idea in general. Sometimes the SAX events can
split text in all sort of places and it would be easier to handle if all
consecutive text is joined together.

 Questions:
 - Is it still worth to do major hacks in LineArea.java?

If you want to get rid of all the bugs, I would say no.

 There are additional issues with consecutive spaces which had
 been discussed here already, in particular how
   foo fo:inline text-decoration=underline bar/fo:inline
 should be handled. Will this result in two consecutive spaces,
 one of them underlined? Has this issue been resolved meanwhile?

IIRC the space in the inline is marked and therefore this space is
retained while the other space is discarded.

 J.Pietschmann



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, email: [EMAIL PROTECTED]




Line breaking and Hyphenation

2002-07-03 Thread Joerg Pietschmann

Hello all,
I tracked down the bugs 10374, 2106 and 6042. The last
bug was caused by a simple, easy to fix mistake in the
hyphenation framework. The bug 10374 is unfortunately
a duplicate of 2106, not 6042, and a bit more interesting.
It is caused by the parser delivering character references
as a separate character chunk, thereby creating multiple
FOText children of the block (FObjMixed) for consecutive
text. This interferes badly with line breaking and
hyphenation. Take
  e#x78;tensible
with room up to the l on the line.
This is split into three FOText objects
  e #x78; tensible
The text is delivered separately to the line layout
algorithm. The e and X do not fill the line but
also are not words and are appended to the pendingAreas
vector. The tensible then overflows the line and is
passed to the hyphenation, lets say it is hyphenated
as tensi-ble. The tensi- is appended without
flushing the pending areas, which are put first into the
next line.
I put a StringBuffer into FObjMixed to accumulate
consecutive addCharacters() events. This fixes the problem
with character references, but not
 efo:inlineX/fo:inlinetensible
(also noted somewhere in bugzilla as problem)
The second is to flush pendig areas in addWord(). This
fixes the lost characters problem but *still* does not
correctly hyphenate words split into inline FOs, only
the chunk actually overflowing the line is considered
for hyphenation.

More problems I noted:
- white space is handled inconsistently
- line break detection relies on white space only
- word detection for hyphenation relies on white space
  and wrongly assumes there is a white space before the
  word passed to doHyphenation()
- the LinkSet is not considered for hyphenated word parts
  in addWord, and neither for page-number-citation nor
  fo:character
- same for most of overlining, line through and vertical
  alignment
- characters are copied to FOText, and then copied *twice*
  in LineArea.layout(), one purely for hyphenation. During
  Layout, character data is at least three times, possibly
  four times (parser buffer) in memory

Questions:
- Is it still worth to do major hacks in LineArea.java?
- Should we consider using Unicode break properties for
  line break opportunity detection?
- How should words for hyphenation be detected?
- What happens to line breaks and word detection in case of
  * inline graphics and other definitely non-text inlines
  * inline foreign elements, like formulas
  * inline-containers containing blocks, especially blocks
with text only
- Are there script or language dependencies to consider for
  line break and word detection?
- At which point should collapse-whitespace, linefeed-treatment
  etc. considered? Possibilities:
  * while creating FOText
  * while feeding it into the line area
  * during line area layout

Considering white-space-collapse during FOText creation has some
problems in case of successive spaces in different inline FO.

There are additional issues with consecutive spaces which had
been discussed here already, in particular how
  foo fo:inline text-decoration=underline bar/fo:inline
should be handled. Will this result in two consecutive spaces,
one of them underlined? Has this issue been resolved meanwhile?

J.Pietschmann

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, email: [EMAIL PROTECTED]