Re: [Bug 27901] - [PATCH] TextCharIterator.remove() does not work properly

2004-04-20 Thread J.Pietschmann
Glen Mazza wrote:
If you replace whitespace handling with replacing every occurrence of 
the letter 'A' with 'B', a similar idea, you can see what I'm getting 
at--the fo:block should be able to clean itself up (if there is such a 
property defined for the fo mandating that cleanup) prior to presenting 
itself to layout, so layout needn't be concerned with the whitespace 
handling.  The earlier this is done, preferably in flow.Block, the less 
work (and fewer instantiations) for FOText object and TextLayoutManager.
Handling space normalization before the text gets into layout might save
some work if the layout uses backtracking. Nevertheless, Text can come
from inline FOs as well, and the normalization process is sensitive to
properties on inlines, e.g. in
 fo:blockA fo:wrapper text-decoration=underline B/fo:wrapper
 /fo:block)
*both* spaces between A and B remain (the current implementation is
non-conformant in this respect).

IIRC flow.Block is parsed into multiple FOText items, each of which get 
fed into the TextLayoutManager.  I'm not certain that line breaks are 
actually being created during layout; rather, during parsing, I suspect 
the BPD is just incremented and the next line is rendered.
The inline layout managers create line break possiblities if they think
a line is full, rather similar to the maintenance branch code. The break
possibility bubles up to the nearest block layout manager, which stores
it, updates the BPD and goes ongetting further break possiblities from
the child LMs.
J.Pietschmann


Re: [Bug 27901] - [PATCH] TextCharIterator.remove() does not work properly

2004-04-19 Thread Glen Mazza
J.Pietschmann wrote:

Hmmm...not that big a deal to me, but I would be inclined to keep the 
whitespace removal out of the LayoutManagers, because it is fo:block 
specific
I don't quite understand this argument. Handling space-before is also
fo:block specific. Where should this logic be put, then? 


If you replace whitespace handling with replacing every occurrence of 
the letter 'A' with 'B', a similar idea, you can see what I'm getting 
at--the fo:block should be able to clean itself up (if there is such a 
property defined for the fo mandating that cleanup) prior to presenting 
itself to layout, so layout needn't be concerned with the whitespace 
handling.  The earlier this is done, preferably in flow.Block, the less 
work (and fewer instantiations) for FOText object and TextLayoutManager.

It is fo:block specific in the sense that not all fo:blocks mandate it.

The analogy with space-before is imperfect because (1) it is really only 
usable as a trait, because it interacts with other Area objects; (2) 
unlike whitespace handling, all fo:blocks have this value, even if 0 it 
must at least be queried by layout; and (3) fo:blocks can resolve into 
multiple Area.Blocks (because they could go past one page), only the 
first of which needs to worry about space-before.  Furthermore, this 
division of fo:blocks into possibly multiple Area.blocks is decided by 
Layout.  So handling space-before is not really fo:block specific, it is 
actually Area.block-specific.

Note that
whitespace handling includes removing spaces around line breaks which
are introduced during the layout process.
IIRC flow.Block is parsed into multiple FOText items, each of which get 
fed into the TextLayoutManager.  I'm not certain that line breaks are 
actually being created during layout; rather, during parsing, I suspect 
the BPD is just incremented and the next line is rendered. 

Glen


Re: [Bug 27901] - [PATCH] TextCharIterator.remove() does not work properly

2004-04-18 Thread J.Pietschmann
Glen Mazza wrote:
proper TR14 line breaking needs
a precious character LB property and a whitespace status
Darn, should be previous.

I'm not sure what you're referring to here--the TR at 
http://www.unicode.org/unicode/reports/tr14/, doesn't appear to mention 
a whitepace status or LB property per se.
In order to determine whether a line break can be inserted between two
non-space characters with optional space chars in between (which would
either be left at the end of the line or discarded), the algorithm
needs the LB properties of both the characters as well as whether
any space was encountered. The line breaking property (those AL
etc. stuff) is defined in one of the Unicode data files for each
character.
Hmmm...not that big a deal to me, but I would be inclined to keep the 
whitespace removal out of the LayoutManagers, because it is fo:block 
specific
I don't quite understand this argument. Handling space-before is also
fo:block specific. Where should this logic be put, then? Note that
whitespace handling includes removing spaces around line breaks which
are introduced during the layout process.
J.Pietschmann


Re: [Bug 27901] - [PATCH] TextCharIterator.remove() does not work properly

2004-04-17 Thread J.Pietschmann
Glen Mazza wrote:
A further optimization might be to do all this before
the Block is even parsed into FOText and Inline
objects, as many spaces-only objects would end up not
even needing to be created.
This will not account for spaces to be removed around line
breaks.
My preferred solution is to combine space processing with
break calculation. This needs some sort of status passed
through to the layout managers, perhaps as part of the
layout context. But then, proper TR14 line breaking needs
a precious character LB property and a whitespace status
too, so this can be combined. The processing would be
roughly as follow:
 *for* word *in* text (separated by whitespace)
   normalize the whitespace (optimize normalization away
for some whitespace status).
   calculate TR14 breaks at the beginning of the word
   *for* TR14 break possiblities *in* word
 *if* line full
check hyphenations
return previous break possiblity
   *end for*
 *end for*
Regards
J.Pietschmann


Re: [Bug 27901] - [PATCH] TextCharIterator.remove() does not work properly

2004-04-17 Thread Glen Mazza
Hello Joerg, welcome back.

J.Pietschmann wrote:

Glen Mazza wrote:

A further optimization might be to do all this before
the Block is even parsed into FOText and Inline
objects, as many spaces-only objects would end up not
even needing to be created.


This will not account for spaces to be removed around line
breaks.
I'm not sure why it wouldn't--as a whitespace removal algorithm should 
be able to take into account line breaks as well.  But even if doesn't 
account for linebreaks, you should still see a reduction in the number 
of TLM instances created, as the FOText instances white-space remove 
themselves into extinction.  It's just that the reduction would not be 
as large as desired.

But then, proper TR14 line breaking needs
a precious character LB property and a whitespace status
too, so this can be combined. 


I'm not sure what you're referring to here--the TR at 
http://www.unicode.org/unicode/reports/tr14/, doesn't appear to mention 
a whitepace status or LB property per se.  But I believe this is 
minor to your point below.

The processing would be
roughly as follow:

*for* word *in* text (separated by whitespace)
   normalize the whitespace (optimize normalization away
for some whitespace status).


Hmmm...not that big a deal to me, but I would be inclined to keep the 
whitespace removal out of the LayoutManagers, because it is fo:block 
specific (depending on the whitespace removal property) as to whether or 
not to even remove whitespace to begin with.  It would be appear ideal 
to keep this business logic out of the Layout Manager classes--instead 
just send it whitespace-normalized (or not normalized, depending on the 
removal property) text, and have TLM process either equivalently.

Another issue, maybe just hairsplitting in this case, is that if it is a 
word that you're extracting in your for-loop,  you can't subsequently 
normalize the whitespace around it, because, by definition, you've just 
taken a word.  To generalize what you're saying, I think you mean, 
each word with assorted whitespace around it--but that may be tough to 
precisely define within a for-loop. 


   calculate TR14 breaks at the beginning of the word
   *for* TR14 break possiblities *in* word
 *if* line full
check hyphenations
return previous break possiblity
   *end for*
 *end for*
This seems to make sense.  (Although this TR is rather sleep-inducing 
for me, at least--we may need to have someone else implement it! ;)

Glen



Re: [Bug 27901] - [PATCH] TextCharIterator.remove() does not work properly

2004-04-07 Thread Simon Pepping
On Tue, Apr 06, 2004 at 03:47:53PM -0700, Glen Mazza wrote:
 I'm not sure exactly what you mean--are you indicating
 it might be best to pull out the interators from
 FOText and flow.Inline and have everything done within
 flow.Block?  That could work well.

Frankly, I do not mean much. It is just an observation. In my patch I
suppressed multiple white space directly in FOText, without regards
for the spec. When I followed your code I realized that
Block.handleWhiteSpace contains white space removal logic according to
the spec, and that it was very wrong to bypass that. I just wanted to
note that.

I do not think that that logic should be duplicated or moved. There is
an inefficiency here, but I do not have any good idea for
optimization.
 
Regards, Simon

-- 
Simon Pepping
home page: http://www.leverkruid.nl



Re: DO NOT REPLY [Bug 27901] - [PATCH] TextCharIterator.remove() does not work properly

2004-04-06 Thread Simon Pepping
On Sun, Apr 04, 2004 at 06:41:41AM -, [EMAIL PROTECTED] wrote:
 --- Additional Comments From [EMAIL PROTECTED]  2004-04-04 06:41 ---
 Anyway, please check the latest solution applied.  Here I brought in the old
 solution to solve the extra spaces problem, while keeping the new solution for
 the more commonly occurring spaces at the front of fo:text.
 
 (As stated above, I suspect  10%), so more optimization may be needed here. 
 (Perhaps an error in my logic of when RemoveA vs. RemoveB should be called.)
 
 Comments most welcome.

The code seems OK to me. 

I have a remark on the logic of nextCharCalled. It is equal to
curIndex - startIndex. Also, in remove() you should separately
consider the case nextCharCalled == 0: it should not happen.

I have looked at the output in the area tree (at output). It does not
reproduce the error in the first block of my test fo, the one with the
colored inline FOs. That may be an error of the PDF renderer. It does
reproduce the error in the second block: The second line is longer
than in the other blocks, and the last line is dropped. That may be an
error in the layout managers.

I saw no problem with the logic of RemoveA, RemoveB or
RemoveC. RemoveB does handle trailing spaces except for the last one.

Regards, Simon

-- 
Simon Pepping
home page: http://www.leverkruid.nl



Re: [Bug 27901] - [PATCH] TextCharIterator.remove() does not work properly

2004-04-06 Thread Simon Pepping
I should add that I have realized that the rules of whitespace
handling in the FO spec are quite complicated, and that it is unwise
to handle whitespace outside of the Block.handleWhiteSpace method.

Regards, Simon

-- 
Simon Pepping
home page: http://www.leverkruid.nl



Re: DO NOT REPLY [Bug 27901] - [PATCH] TextCharIterator.remove() does not work properly

2004-04-06 Thread Glen Mazza
Thanks for looking through the code, Simon.

--- Simon Pepping [EMAIL PROTECTED] wrote:
 
 I have a remark on the logic of nextCharCalled. It
 is equal to
 curIndex - startIndex. Also, in remove() you should
 separately
 consider the case nextCharCalled == 0: it should not
 happen.
 

nextCharCalled is a terribly named variable--but I
haven't thought of anything better.  Suggestions
welcome for its renaming.  

The process of moving from RemoveA (where I can just
nicely increment the startIndex, i.e., to remove
leading spaces) to RemoveB (where I need to start
doing arraycopies, e.g., to remove spaces between
words) is activated when two or more consecutive
nextChar() iterations occur *without* a remove() call
in between.  This would indicate that we've
encountered a valid character, so we've hit a word and
now any subsequent space removal has to occur via
RemoveB method.

Whenever remove() is called during RemoveA time, I
just keep resetting it to zero for the logic above to
occur.  It may not be the best, however, as I
mentioned earlier RemoveB appears to be called too
soon too often.

I'll get all this commented soon in the code.

 I have looked at the output in the area tree (at
 output). It does not
 reproduce the error in the first block of my test
 fo, the one with the
 colored inline FOs. That may be an error of the PDF
 renderer. 

I'm unsure if this is good news or bad news.  ;-)  But
anyway, more work for us...

Glen


Re: [Bug 27901] - [PATCH] TextCharIterator.remove() does not work properly

2004-04-06 Thread Glen Mazza
I'm not sure exactly what you mean--are you indicating
it might be best to pull out the interators from
FOText and flow.Inline and have everything done within
flow.Block?  That could work well.

A further optimization might be to do all this before
the Block is even parsed into FOText and Inline
objects, as many spaces-only objects would end up not
even needing to be created.

I was originally thinking of the other
direction--instead of all the bouncing code between
Block and FOText, just put all the logic in FOText,
and have FOText completely responsible for its own
whitespace removal.  Problem with this though is that
we'd have to duplicate the whitespace processing logic
into Inline as well, also, there's the issue of
whitespace between consecutive FOText and/or Inline
objects still needing to be handled.  So this probably
wouldn't be a good idea.

Thanks,
Glen


--- Simon Pepping [EMAIL PROTECTED] wrote:
 I should add that I have realized that the rules of
 whitespace
 handling in the FO spec are quite complicated, and
 that it is unwise
 to handle whitespace outside of the
 Block.handleWhiteSpace method.
 
 Regards, Simon
 
 -- 
 Simon Pepping
 home page: http://www.leverkruid.nl
 



[Fwd: DO NOT REPLY [Bug 27901] - [PATCH] TextCharIterator.remove() does not work properly]

2004-04-05 Thread Chris Bowditch
Comments most welcome.

Glen,

In general I like your solution. However, I did a bit of testing with inlines 
and the effects are very serious. Valid non-whitespace characters from the 
enclosing block are simply deleted if there is whitespace at the start of the 
fo:inline.

Chris




Re: [Fwd: DO NOT REPLY [Bug 27901] - [PATCH] TextCharIterator.remove() does not work properly]

2004-04-05 Thread Glen Mazza
Yes, this needs more work then.  I did not consider
fo:inlines when starting.

Still, I'm surprised that fo:inlines are within the
same FOText object (the fo:block is already parsed
into several FOText objects, roughly FOText object per
 line in the source document)--the appearance of an
fo:inline should precipitate a new FOText object,
one would think.  That could be one solution.

Glen

--- Chris Bowditch [EMAIL PROTECTED] wrote:
 
 Comments most welcome.
 
 Glen,
 
 In general I like your solution. However, I did a
 bit of testing with inlines 
 and the effects are very serious. Valid
 non-whitespace characters from the 
 enclosing block are simply deleted if there is
 whitespace at the start of the 
 fo:inline.
 
 Chris
 
 



DO NOT REPLY [Bug 27901] - [PATCH] TextCharIterator.remove() does not work properly

2004-04-03 Thread bugzilla
DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG 
RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
http://issues.apache.org/bugzilla/show_bug.cgi?id=27901.
ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND 
INSERTED IN THE BUG DATABASE.

http://issues.apache.org/bugzilla/show_bug.cgi?id=27901

[PATCH] TextCharIterator.remove() does not work properly





--- Additional Comments From [EMAIL PROTECTED]  2004-04-03 20:01 ---
I like the idea of the black list. But it is not necessary to make it
explicit. It is possible to remove multiple white space characters in
a single loop. In the attached patch I have implemented this idea in
the constructor of FOText, so that FOText contains a text without
multiple spaces.

The attached test fo file demonstrates that it works well, except when
the block is split up in several FOs. In that case text disappears.  I
have checked that that does not happen in the FOText constructor and
that TextLayoutManager.textArray is also correct. In view of the fact
that the results are different whether the inlines have color or not,
it may also be an error in the renderer.

The intermediate array caLong could be skipped if it were not
necessary to truncate ca to its exact length. This could be done if
every method and client would use node.length instead of
node.ca.length.


DO NOT REPLY [Bug 27901] - [PATCH] TextCharIterator.remove() does not work properly

2004-04-03 Thread bugzilla
DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG 
RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
http://issues.apache.org/bugzilla/show_bug.cgi?id=27901.
ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND 
INSERTED IN THE BUG DATABASE.

http://issues.apache.org/bugzilla/show_bug.cgi?id=27901

[PATCH] TextCharIterator.remove() does not work properly





--- Additional Comments From [EMAIL PROTECTED]  2004-04-03 20:03 ---
Created an attachment (id=4)
my patch


DO NOT REPLY [Bug 27901] - [PATCH] TextCharIterator.remove() does not work properly

2004-04-03 Thread bugzilla
DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG 
RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
http://issues.apache.org/bugzilla/show_bug.cgi?id=27901.
ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND 
INSERTED IN THE BUG DATABASE.

http://issues.apache.org/bugzilla/show_bug.cgi?id=27901

[PATCH] TextCharIterator.remove() does not work properly





--- Additional Comments From [EMAIL PROTECTED]  2004-04-04 06:41 ---
The BlackList method is not my favorite, because I suspect only a few percent of
extra spaces found are between words--the extra complexity appears unwarranted.
 My debugging has indicated that 95%+ of extra spaces all appear at the
beginning of the FOText object, so it is best to just increment a start index
for those.  For the few percent of errant extra spaces between words, then an
arraycopy can be called, removing the spaces one-by-one.  (Keep in mind, one
month ago we were doing an arraycopy for *every* space, even those at the
beginning of the string, so a 95% reduction in arraycopies is already a huge
improvement.)  Please advise if you've come to different conclusions--but the
complexity doesn't appear necessary to me.  HOWEVER/Note:  I've only been
working with fo:blocks, not with fo:inlines, so my above reasoning may not be
valid when those are taken into consideration.

Anyway, please check the latest solution applied.  Here I brought in the old
solution to solve the extra spaces problem, while keeping the new solution for
the more commonly occurring spaces at the front of fo:text.

The (commented-out) System.out.println's in the remove() method are useful for
visualizing the space removal, and indicate the following:

RemoveA:  Newest method, that just increments the start index until a
word is found.  No array copies needed here, very speedy.

RemoveB and RemoveC:  The previous method, ideally just activated for the
relatively rare occurrences of extra spaces between words.  Unfortunately,
RemoveB and RemoveC are called much more often (50% or so) than they should be 
(As stated above, I suspect  10%), so more optimization may be needed here. 
(Perhaps an error in my logic of when RemoveA vs. RemoveB should be called.)

Comments most welcome.

Glen


DO NOT REPLY [Bug 27901] - [PATCH] TextCharIterator.remove() does not work properly

2004-04-02 Thread bugzilla
DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG 
RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
http://issues.apache.org/bugzilla/show_bug.cgi?id=27901.
ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND 
INSERTED IN THE BUG DATABASE.

http://issues.apache.org/bugzilla/show_bug.cgi?id=27901

[PATCH] TextCharIterator.remove() does not work properly

[EMAIL PROTECTED] changed:

   What|Removed |Added

Summary|TextCharIterator.remove()   |[PATCH]
   |does not work properly  |TextCharIterator.remove()
   ||does not work properly


DO NOT REPLY [Bug 27901] - [PATCH] TextCharIterator.remove() does not work properly

2004-04-02 Thread bugzilla
DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG 
RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
http://issues.apache.org/bugzilla/show_bug.cgi?id=27901.
ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND 
INSERTED IN THE BUG DATABASE.

http://issues.apache.org/bugzilla/show_bug.cgi?id=27901

[PATCH] TextCharIterator.remove() does not work properly





--- Additional Comments From [EMAIL PROTECTED]  2004-04-02 22:05 ---
I hope to get back to this over the weekend--I was working the logging issue 
last weekend.