Re: [Bug 27901] - [PATCH] TextCharIterator.remove() does not work properly
Glen Mazza wrote: If you replace whitespace handling with replacing every occurrence of the letter 'A' with 'B', a similar idea, you can see what I'm getting at--the fo:block should be able to clean itself up (if there is such a property defined for the fo mandating that cleanup) prior to presenting itself to layout, so layout needn't be concerned with the whitespace handling. The earlier this is done, preferably in flow.Block, the less work (and fewer instantiations) for FOText object and TextLayoutManager. Handling space normalization before the text gets into layout might save some work if the layout uses backtracking. Nevertheless, Text can come from inline FOs as well, and the normalization process is sensitive to properties on inlines, e.g. in fo:blockA fo:wrapper text-decoration=underline B/fo:wrapper /fo:block) *both* spaces between A and B remain (the current implementation is non-conformant in this respect). IIRC flow.Block is parsed into multiple FOText items, each of which get fed into the TextLayoutManager. I'm not certain that line breaks are actually being created during layout; rather, during parsing, I suspect the BPD is just incremented and the next line is rendered. The inline layout managers create line break possiblities if they think a line is full, rather similar to the maintenance branch code. The break possibility bubles up to the nearest block layout manager, which stores it, updates the BPD and goes ongetting further break possiblities from the child LMs. J.Pietschmann
Re: [Bug 27901] - [PATCH] TextCharIterator.remove() does not work properly
J.Pietschmann wrote: Hmmm...not that big a deal to me, but I would be inclined to keep the whitespace removal out of the LayoutManagers, because it is fo:block specific I don't quite understand this argument. Handling space-before is also fo:block specific. Where should this logic be put, then? If you replace whitespace handling with replacing every occurrence of the letter 'A' with 'B', a similar idea, you can see what I'm getting at--the fo:block should be able to clean itself up (if there is such a property defined for the fo mandating that cleanup) prior to presenting itself to layout, so layout needn't be concerned with the whitespace handling. The earlier this is done, preferably in flow.Block, the less work (and fewer instantiations) for FOText object and TextLayoutManager. It is fo:block specific in the sense that not all fo:blocks mandate it. The analogy with space-before is imperfect because (1) it is really only usable as a trait, because it interacts with other Area objects; (2) unlike whitespace handling, all fo:blocks have this value, even if 0 it must at least be queried by layout; and (3) fo:blocks can resolve into multiple Area.Blocks (because they could go past one page), only the first of which needs to worry about space-before. Furthermore, this division of fo:blocks into possibly multiple Area.blocks is decided by Layout. So handling space-before is not really fo:block specific, it is actually Area.block-specific. Note that whitespace handling includes removing spaces around line breaks which are introduced during the layout process. IIRC flow.Block is parsed into multiple FOText items, each of which get fed into the TextLayoutManager. I'm not certain that line breaks are actually being created during layout; rather, during parsing, I suspect the BPD is just incremented and the next line is rendered. Glen
Re: [Bug 27901] - [PATCH] TextCharIterator.remove() does not work properly
Glen Mazza wrote: proper TR14 line breaking needs a precious character LB property and a whitespace status Darn, should be previous. I'm not sure what you're referring to here--the TR at http://www.unicode.org/unicode/reports/tr14/, doesn't appear to mention a whitepace status or LB property per se. In order to determine whether a line break can be inserted between two non-space characters with optional space chars in between (which would either be left at the end of the line or discarded), the algorithm needs the LB properties of both the characters as well as whether any space was encountered. The line breaking property (those AL etc. stuff) is defined in one of the Unicode data files for each character. Hmmm...not that big a deal to me, but I would be inclined to keep the whitespace removal out of the LayoutManagers, because it is fo:block specific I don't quite understand this argument. Handling space-before is also fo:block specific. Where should this logic be put, then? Note that whitespace handling includes removing spaces around line breaks which are introduced during the layout process. J.Pietschmann
Re: [Bug 27901] - [PATCH] TextCharIterator.remove() does not work properly
Glen Mazza wrote: A further optimization might be to do all this before the Block is even parsed into FOText and Inline objects, as many spaces-only objects would end up not even needing to be created. This will not account for spaces to be removed around line breaks. My preferred solution is to combine space processing with break calculation. This needs some sort of status passed through to the layout managers, perhaps as part of the layout context. But then, proper TR14 line breaking needs a precious character LB property and a whitespace status too, so this can be combined. The processing would be roughly as follow: *for* word *in* text (separated by whitespace) normalize the whitespace (optimize normalization away for some whitespace status). calculate TR14 breaks at the beginning of the word *for* TR14 break possiblities *in* word *if* line full check hyphenations return previous break possiblity *end for* *end for* Regards J.Pietschmann
Re: [Bug 27901] - [PATCH] TextCharIterator.remove() does not work properly
Hello Joerg, welcome back. J.Pietschmann wrote: Glen Mazza wrote: A further optimization might be to do all this before the Block is even parsed into FOText and Inline objects, as many spaces-only objects would end up not even needing to be created. This will not account for spaces to be removed around line breaks. I'm not sure why it wouldn't--as a whitespace removal algorithm should be able to take into account line breaks as well. But even if doesn't account for linebreaks, you should still see a reduction in the number of TLM instances created, as the FOText instances white-space remove themselves into extinction. It's just that the reduction would not be as large as desired. But then, proper TR14 line breaking needs a precious character LB property and a whitespace status too, so this can be combined. I'm not sure what you're referring to here--the TR at http://www.unicode.org/unicode/reports/tr14/, doesn't appear to mention a whitepace status or LB property per se. But I believe this is minor to your point below. The processing would be roughly as follow: *for* word *in* text (separated by whitespace) normalize the whitespace (optimize normalization away for some whitespace status). Hmmm...not that big a deal to me, but I would be inclined to keep the whitespace removal out of the LayoutManagers, because it is fo:block specific (depending on the whitespace removal property) as to whether or not to even remove whitespace to begin with. It would be appear ideal to keep this business logic out of the Layout Manager classes--instead just send it whitespace-normalized (or not normalized, depending on the removal property) text, and have TLM process either equivalently. Another issue, maybe just hairsplitting in this case, is that if it is a word that you're extracting in your for-loop, you can't subsequently normalize the whitespace around it, because, by definition, you've just taken a word. To generalize what you're saying, I think you mean, each word with assorted whitespace around it--but that may be tough to precisely define within a for-loop. calculate TR14 breaks at the beginning of the word *for* TR14 break possiblities *in* word *if* line full check hyphenations return previous break possiblity *end for* *end for* This seems to make sense. (Although this TR is rather sleep-inducing for me, at least--we may need to have someone else implement it! ;) Glen
Re: [Bug 27901] - [PATCH] TextCharIterator.remove() does not work properly
On Tue, Apr 06, 2004 at 03:47:53PM -0700, Glen Mazza wrote: I'm not sure exactly what you mean--are you indicating it might be best to pull out the interators from FOText and flow.Inline and have everything done within flow.Block? That could work well. Frankly, I do not mean much. It is just an observation. In my patch I suppressed multiple white space directly in FOText, without regards for the spec. When I followed your code I realized that Block.handleWhiteSpace contains white space removal logic according to the spec, and that it was very wrong to bypass that. I just wanted to note that. I do not think that that logic should be duplicated or moved. There is an inefficiency here, but I do not have any good idea for optimization. Regards, Simon -- Simon Pepping home page: http://www.leverkruid.nl
Re: DO NOT REPLY [Bug 27901] - [PATCH] TextCharIterator.remove() does not work properly
On Sun, Apr 04, 2004 at 06:41:41AM -, [EMAIL PROTECTED] wrote: --- Additional Comments From [EMAIL PROTECTED] 2004-04-04 06:41 --- Anyway, please check the latest solution applied. Here I brought in the old solution to solve the extra spaces problem, while keeping the new solution for the more commonly occurring spaces at the front of fo:text. (As stated above, I suspect 10%), so more optimization may be needed here. (Perhaps an error in my logic of when RemoveA vs. RemoveB should be called.) Comments most welcome. The code seems OK to me. I have a remark on the logic of nextCharCalled. It is equal to curIndex - startIndex. Also, in remove() you should separately consider the case nextCharCalled == 0: it should not happen. I have looked at the output in the area tree (at output). It does not reproduce the error in the first block of my test fo, the one with the colored inline FOs. That may be an error of the PDF renderer. It does reproduce the error in the second block: The second line is longer than in the other blocks, and the last line is dropped. That may be an error in the layout managers. I saw no problem with the logic of RemoveA, RemoveB or RemoveC. RemoveB does handle trailing spaces except for the last one. Regards, Simon -- Simon Pepping home page: http://www.leverkruid.nl
Re: [Bug 27901] - [PATCH] TextCharIterator.remove() does not work properly
I should add that I have realized that the rules of whitespace handling in the FO spec are quite complicated, and that it is unwise to handle whitespace outside of the Block.handleWhiteSpace method. Regards, Simon -- Simon Pepping home page: http://www.leverkruid.nl
Re: DO NOT REPLY [Bug 27901] - [PATCH] TextCharIterator.remove() does not work properly
Thanks for looking through the code, Simon. --- Simon Pepping [EMAIL PROTECTED] wrote: I have a remark on the logic of nextCharCalled. It is equal to curIndex - startIndex. Also, in remove() you should separately consider the case nextCharCalled == 0: it should not happen. nextCharCalled is a terribly named variable--but I haven't thought of anything better. Suggestions welcome for its renaming. The process of moving from RemoveA (where I can just nicely increment the startIndex, i.e., to remove leading spaces) to RemoveB (where I need to start doing arraycopies, e.g., to remove spaces between words) is activated when two or more consecutive nextChar() iterations occur *without* a remove() call in between. This would indicate that we've encountered a valid character, so we've hit a word and now any subsequent space removal has to occur via RemoveB method. Whenever remove() is called during RemoveA time, I just keep resetting it to zero for the logic above to occur. It may not be the best, however, as I mentioned earlier RemoveB appears to be called too soon too often. I'll get all this commented soon in the code. I have looked at the output in the area tree (at output). It does not reproduce the error in the first block of my test fo, the one with the colored inline FOs. That may be an error of the PDF renderer. I'm unsure if this is good news or bad news. ;-) But anyway, more work for us... Glen
Re: [Bug 27901] - [PATCH] TextCharIterator.remove() does not work properly
I'm not sure exactly what you mean--are you indicating it might be best to pull out the interators from FOText and flow.Inline and have everything done within flow.Block? That could work well. A further optimization might be to do all this before the Block is even parsed into FOText and Inline objects, as many spaces-only objects would end up not even needing to be created. I was originally thinking of the other direction--instead of all the bouncing code between Block and FOText, just put all the logic in FOText, and have FOText completely responsible for its own whitespace removal. Problem with this though is that we'd have to duplicate the whitespace processing logic into Inline as well, also, there's the issue of whitespace between consecutive FOText and/or Inline objects still needing to be handled. So this probably wouldn't be a good idea. Thanks, Glen --- Simon Pepping [EMAIL PROTECTED] wrote: I should add that I have realized that the rules of whitespace handling in the FO spec are quite complicated, and that it is unwise to handle whitespace outside of the Block.handleWhiteSpace method. Regards, Simon -- Simon Pepping home page: http://www.leverkruid.nl
[Fwd: DO NOT REPLY [Bug 27901] - [PATCH] TextCharIterator.remove() does not work properly]
Comments most welcome. Glen, In general I like your solution. However, I did a bit of testing with inlines and the effects are very serious. Valid non-whitespace characters from the enclosing block are simply deleted if there is whitespace at the start of the fo:inline. Chris
Re: [Fwd: DO NOT REPLY [Bug 27901] - [PATCH] TextCharIterator.remove() does not work properly]
Yes, this needs more work then. I did not consider fo:inlines when starting. Still, I'm surprised that fo:inlines are within the same FOText object (the fo:block is already parsed into several FOText objects, roughly FOText object per line in the source document)--the appearance of an fo:inline should precipitate a new FOText object, one would think. That could be one solution. Glen --- Chris Bowditch [EMAIL PROTECTED] wrote: Comments most welcome. Glen, In general I like your solution. However, I did a bit of testing with inlines and the effects are very serious. Valid non-whitespace characters from the enclosing block are simply deleted if there is whitespace at the start of the fo:inline. Chris
DO NOT REPLY [Bug 27901] - [PATCH] TextCharIterator.remove() does not work properly
DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT http://issues.apache.org/bugzilla/show_bug.cgi?id=27901. ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND INSERTED IN THE BUG DATABASE. http://issues.apache.org/bugzilla/show_bug.cgi?id=27901 [PATCH] TextCharIterator.remove() does not work properly --- Additional Comments From [EMAIL PROTECTED] 2004-04-03 20:01 --- I like the idea of the black list. But it is not necessary to make it explicit. It is possible to remove multiple white space characters in a single loop. In the attached patch I have implemented this idea in the constructor of FOText, so that FOText contains a text without multiple spaces. The attached test fo file demonstrates that it works well, except when the block is split up in several FOs. In that case text disappears. I have checked that that does not happen in the FOText constructor and that TextLayoutManager.textArray is also correct. In view of the fact that the results are different whether the inlines have color or not, it may also be an error in the renderer. The intermediate array caLong could be skipped if it were not necessary to truncate ca to its exact length. This could be done if every method and client would use node.length instead of node.ca.length.
DO NOT REPLY [Bug 27901] - [PATCH] TextCharIterator.remove() does not work properly
DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT http://issues.apache.org/bugzilla/show_bug.cgi?id=27901. ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND INSERTED IN THE BUG DATABASE. http://issues.apache.org/bugzilla/show_bug.cgi?id=27901 [PATCH] TextCharIterator.remove() does not work properly --- Additional Comments From [EMAIL PROTECTED] 2004-04-03 20:03 --- Created an attachment (id=4) my patch
DO NOT REPLY [Bug 27901] - [PATCH] TextCharIterator.remove() does not work properly
DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT http://issues.apache.org/bugzilla/show_bug.cgi?id=27901. ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND INSERTED IN THE BUG DATABASE. http://issues.apache.org/bugzilla/show_bug.cgi?id=27901 [PATCH] TextCharIterator.remove() does not work properly --- Additional Comments From [EMAIL PROTECTED] 2004-04-04 06:41 --- The BlackList method is not my favorite, because I suspect only a few percent of extra spaces found are between words--the extra complexity appears unwarranted. My debugging has indicated that 95%+ of extra spaces all appear at the beginning of the FOText object, so it is best to just increment a start index for those. For the few percent of errant extra spaces between words, then an arraycopy can be called, removing the spaces one-by-one. (Keep in mind, one month ago we were doing an arraycopy for *every* space, even those at the beginning of the string, so a 95% reduction in arraycopies is already a huge improvement.) Please advise if you've come to different conclusions--but the complexity doesn't appear necessary to me. HOWEVER/Note: I've only been working with fo:blocks, not with fo:inlines, so my above reasoning may not be valid when those are taken into consideration. Anyway, please check the latest solution applied. Here I brought in the old solution to solve the extra spaces problem, while keeping the new solution for the more commonly occurring spaces at the front of fo:text. The (commented-out) System.out.println's in the remove() method are useful for visualizing the space removal, and indicate the following: RemoveA: Newest method, that just increments the start index until a word is found. No array copies needed here, very speedy. RemoveB and RemoveC: The previous method, ideally just activated for the relatively rare occurrences of extra spaces between words. Unfortunately, RemoveB and RemoveC are called much more often (50% or so) than they should be (As stated above, I suspect 10%), so more optimization may be needed here. (Perhaps an error in my logic of when RemoveA vs. RemoveB should be called.) Comments most welcome. Glen
DO NOT REPLY [Bug 27901] - [PATCH] TextCharIterator.remove() does not work properly
DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT http://issues.apache.org/bugzilla/show_bug.cgi?id=27901. ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND INSERTED IN THE BUG DATABASE. http://issues.apache.org/bugzilla/show_bug.cgi?id=27901 [PATCH] TextCharIterator.remove() does not work properly [EMAIL PROTECTED] changed: What|Removed |Added Summary|TextCharIterator.remove() |[PATCH] |does not work properly |TextCharIterator.remove() ||does not work properly
DO NOT REPLY [Bug 27901] - [PATCH] TextCharIterator.remove() does not work properly
DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT http://issues.apache.org/bugzilla/show_bug.cgi?id=27901. ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND INSERTED IN THE BUG DATABASE. http://issues.apache.org/bugzilla/show_bug.cgi?id=27901 [PATCH] TextCharIterator.remove() does not work properly --- Additional Comments From [EMAIL PROTECTED] 2004-04-02 22:05 --- I hope to get back to this over the weekend--I was working the logging issue last weekend.