Re: svn commit: r328381 - in /xmlgraphics/fop/trunk/src/java/org/apache/fop: area/inline/ layoutmgr/inline/ render/ render/pdf/ render/xml/
On Fri, 28 Oct 2005 06:06 pm, Luca Furini wrote: Manuel Mall wrote: I added a boolean attribute in SpaceArea that is true for adjustable spaces (at the moment it is not used, but I will fix it soon). Why would you envisage this is required by the renderers? I can only speak of the PDFRenderer, as I don't know the other formats very well. Remember that this started from the bug 36238: in the pdf format, the multi-byte normal space character is not affected by the Tw operator (that sets the word adjustment) so we needed a way to apply this adjustment in another manner. The pdf format allow to specify an horizontal offset between fragments of text, so that their distance can be increased or reduced: if the font is multy-byte, using this feature we can adjust the spaces. But we need to know which spaces can be adjusted, and which cannot. If we don't wont to duplicate the logic for the space recognition, the SpaceAreas must simply have a boolean value stating whether the space is adjustable, so that the renderers won't need to look at the space and decide. I don't get that point. Isn't it enough for the renderer to know the offset for the area in question? What additional decisions would the renderer make based on the adjust flag? Or do you mean we still have the twsAdjust on the TextArea and the offset is only relative to twsAdjust? Do we really gain anything with that instead of making the offset the corrected twsAdjust value? At the moment the offset in SpaceArea and WordArea are unused, but this is how I think they could be used: if, because of the rounding in the adjustment computation, the applied adjustment is different than the needed one, the TextLM should distribute this difference (a few millipoints) among the SpaceAreas and / or WordAreas, setting their offset. I see. Not sure if offset is the correct attribute though. Its stated purpose is for offsetting in the bpd direction. Overloading it with a different meaning is questionable. Maybe offset is not the most appropriate name, even if the pdf specification calls it that way (last line in the last row of table 5.6); at least it is can be misleading in the context of fop. Instead for SpaceAreas we could use the IPD trait. Basically the LM tells the renderers via the SpaceArea 'I want an ipd mpt wide space here'. How the renderers make this space happen is up to them. Not sure if we should even consider distributing the rounding errors to the WordAreas. I would leave them with the SpaceAreas. At the moment neither the WordAreas nor the SpaceAreas have the ipd explicitly set, so it is 0 (the ipd is set for the parent TextArea): these kind of area were initially thought of as light weight areas, in order to increase the size of the area tree as little as possible, and we could even define them in a minimal fashion, so that they have no unused inherited attributes and methods. As you said a saving would only be achieved if you actually change the Area class hierarchy such that those fields are not carried by every instance. Otherwise setting ipd to 0 or anything else is not saving a single bit. Setting the ipd for each word and each space is another approach: it would give the full control on the text positioning to fop's layout enginge, while at the moment we control the overall text using the adjustment parameters. Isn't that the same if you set an offset on each of those, LM takes control and tells the renderers the detail of the positioning for each fragment including spacing? The drawback of this increased control is that the renderers would need to be more complex in order to handle the extra information. A different approach would be not to have the space areas at all and simply set space-start/space-end traits on the word areas but that may be a bit radical (in the sense of its impact on the renderers) at this stage (but see below - CJK). [1] My only concern is: if we show spaces but don't insert space characters, how would the text extracted from such an output look like? Wouldn'titbesomethinglikethis? :-) Good point - we probably need the space char in the pdf otherwise we would have some unhappy users later. Of course we could also use space-start/space-end traits on the SpaceAreas to model shrink/stretch instead of the IPD trait. May be that would be preferable? Well, this is not very different from what is now (improperly) called offset :-) Correct, it is logically the same. The renderers will use this according to their own adjustment rule: for example the PDFRenderer would add it to the text adjustment if the character is multibyte. The offset could come in handy for the cjk support (bug 36977): in this case there are no adjustable spaces, and if text is justified all the difference between line width and unadjusted character width could be handled modifying the offsets of some
Re: svn commit: r328381 - in /xmlgraphics/fop/trunk/src/java/org/apache/fop: area/inline/ layoutmgr/inline/ render/ render/pdf/ render/xml/
Manuel Mall wrote: But we need to know which spaces can be adjusted, and which cannot. If we don't wont to duplicate the logic for the space recognition, the SpaceAreas must simply have a boolean value stating whether the space is adjustable, so that the renderers won't need to look at the space and decide. I don't get that point. Isn't it enough for the renderer to know the offset for the area in question? What additional decisions would the renderer make based on the adjust flag? Or do you mean we still have the twsAdjust on the TextArea and the offset is only relative to twsAdjust? Do we really gain anything with that instead of making the offset the corrected twsAdjust value? At the moment we still use the twsAdjust value, and the individual offset would be an additional adjustment. Maybe there is little gain, but when the font is not multi-byte this saves us from setting the offset on each adjustable SpaceArea and using it in the renderer. It's not much, both in terms of time and output length: but if there is an easy way to adjust all the spaces at once ... why should we do another way? :-) [...] So, what if we rename offset - spaceAfter? It seems to me that we are here speaking of the same thing using two different names. :-) Fair enough, I agree we do. Good! We just have to reach an agreement on this last detail, and I'll implement the changes. Regards Luca
Re: svn commit: r328381 - in /xmlgraphics/fop/trunk/src/java/org/apache/fop: area/inline/ layoutmgr/inline/ render/ render/pdf/ render/xml/
I wrote: Manuel Mall wrote: There is no need to expose creation of the Space/Word areas directly to TextLayoutManager either. TextArea could easily expose an addWord and an addSpace method instead of the monolithic setText. In the end it probably boils down to me arguing that the setText logic currently in TextArea IMO should be in TextLayoutManager (and probably based on its data structures) because it is an operation closely coupled to layout and not to areas. Ok. Done: http://svn.apache.org/viewcvs.cgi?view=revrev=328882 I added a boolean attribute in SpaceArea that is true for adjustable spaces (at the moment it is not used, but I will fix it soon). At the moment the offset in SpaceArea and WordArea are unused, but this is how I think they could be used: if, because of the rounding in the adjustment computation, the applied adjustment is different than the needed one, the TextLM should distribute this difference (a few millipoints) among the SpaceAreas and / or WordAreas, setting their offset. The renderers will use this according to their own adjustment rule: for example the PDFRenderer would add it to the text adjustment if the character is multibyte. The offset could come in handy for the cjk support (bug 36977): in this case there are no adjustable spaces, and if text is justified all the difference between line width and unadjusted character width could be handled modifying the offsets of some special characters. Regards Luca
Re: svn commit: r328381 - in /xmlgraphics/fop/trunk/src/java/org/apache/fop: area/inline/ layoutmgr/inline/ render/ render/pdf/ render/xml/
Manuel Mall wrote: I have a question on this. You break in TextArea the text into words based on CharUtilities.isAnySpace. Is this guaranteed to be consistent with the breaking and adjustment calculations in TextLayoutManager? I am concerned we may be using different rules for word breaking in different places. As far as consistency is concerned, I agree with you: the handling of the different kinds of spaces (breaking, non-breaking, fixed width, ...) is still quite incomplete and dispersed over different classes. Just to add another example, the CharacterLM implicitly expects its character to be a non-space character and has its own lines of code concerning the creation of the elements, while it could share the methods already called by the TextLM. Having a single, centralized class taking care of the breaking (be it a Java utility class or a Fop one) and a single, shared method implementing the creation of the elements would surely increase consistency and clarity. Somehow it doesn't feel right to me that TextLayoutManager does all the breaking and calculations and then we give the whole chunk to TextArea and it breaks it again using a possibly different algorithm but still using the adjustment value calculated by TextLayoutManager. When I was trying to fix bug 36238 I initially started modifying TextLM#createTextArea(), using the AreaInfo objects to create WordAreas and SpaceAreas, but I then decided to move the string splitting inside TextArea because: 1) if WordAreas and SpaceAreas are not directly created by the LMs, there is no need to change a single line of code inside the classes creating TextAreas; this is not a real reason supporting the choice, just an handy consequence of it; 2) if TextArea still provides a getText() method, the renderers are not forced to render the text word by word and space by space if their word spacing treatment is not affected by multi-byte characters; but once again, this is not a real reason as we could provide this method anyway; 3) although both SpaceArea and WordArea hava an offset attribute it is ATM not used, so these areas does not carry any formatting information; their only purpose is to highlight spaces, thus allowing some specific renderer to handle them correctly regardless of their encoding; in other words, we are not losing braking and calculations, we simply do not need them anymore as we already know exactly which text will be placed in each line, and how wide it will be once it's correctly adjusted; 4) the text that will be placed in a line cannot be directly taken from textArray (in the TextLM), and the string str should be used instead anyway, as it may be different from the concatenation of the single pieces of text; at the moment the only difference concerns the hyphenation character - added at the end of the line, but I suspect that in different languages there could be other differences; so, we cannot simply create a WordAreas for each AreaInfo object. So, if you find it strange to break the text, put it together and split it again, me too! :-) But this initial feeling disappeared when I realized that the final splitting does not involve breaking in its proper sense, but just classification of characters. This is why I did what I did; if I did not manage to convince you ... you can try and convince me! :-) Regards Luca
Re: svn commit: r328381 - in /xmlgraphics/fop/trunk/src/java/org/apache/fop: area/inline/ layoutmgr/inline/ render/ render/pdf/ render/xml/
On Wed, 26 Oct 2005 05:15 pm, Luca Furini wrote: Manuel Mall wrote: I have a question on this. You break in TextArea the text into words based on CharUtilities.isAnySpace. Is this guaranteed to be consistent with the breaking and adjustment calculations in TextLayoutManager? I am concerned we may be using different rules for word breaking in different places. As far as consistency is concerned, I agree with you: the handling of the different kinds of spaces (breaking, non-breaking, fixed width, ...) is still quite incomplete and dispersed over different classes. Just to add another example, the CharacterLM implicitly expects its character to be a non-space character and has its own lines of code concerning the creation of the elements, while it could share the methods already called by the TextLM. Having a single, centralized class taking care of the breaking (be it a Java utility class or a Fop one) and a single, shared method implementing the creation of the elements would surely increase consistency and clarity. Somehow it doesn't feel right to me that TextLayoutManager does all the breaking and calculations and then we give the whole chunk to TextArea and it breaks it again using a possibly different algorithm but still using the adjustment value calculated by TextLayoutManager. When I was trying to fix bug 36238 I initially started modifying TextLM#createTextArea(), using the AreaInfo objects to create WordAreas and SpaceAreas, but I then decided to move the string splitting inside TextArea because: 1) if WordAreas and SpaceAreas are not directly created by the LMs, there is no need to change a single line of code inside the classes creating TextAreas; this is not a real reason supporting the choice, just an handy consequence of it; 2) if TextArea still provides a getText() method, the renderers are not forced to render the text word by word and space by space if their word spacing treatment is not affected by multi-byte characters; but once again, this is not a real reason as we could provide this method anyway; 3) although both SpaceArea and WordArea hava an offset attribute it is ATM not used, so these areas does not carry any formatting information; their only purpose is to highlight spaces, thus allowing some specific renderer to handle them correctly regardless of their encoding; in other words, we are not losing braking and calculations, we simply do not need them anymore as we already know exactly which text will be placed in each line, and how wide it will be once it's correctly adjusted; 4) the text that will be placed in a line cannot be directly taken from textArray (in the TextLM), and the string str should be used instead anyway, as it may be different from the concatenation of the single pieces of text; at the moment the only difference concerns the hyphenation character - added at the end of the line, but I suspect that in different languages there could be other differences; so, we cannot simply create a WordAreas for each AreaInfo object. So, if you find it strange to break the text, put it together and split it again, me too! :-) But this initial feeling disappeared when I realized that the final splitting does not involve breaking in its proper sense, but just classification of characters. This is why I did what I did; if I did not manage to convince you ... you can try and convince me! :-) I must admit you haven't convinced me. The basic premise still is TextLayoutManager does all the calculations including determining the number of word spaces and the resulting adjustment, that means it must know where the word spaces are. Why should TextArea recalculate the positions (and wrong as well because isAnySpace() tests for 7 different UNICODE space values not all of them adjustables spaces while TextLayoutManager uses a much smaller set to calculate the adjustment values)? There is no need to expose creation of the Space/Word areas directly to TextLayoutManager either. TextArea could easily expose an addWord and an addSpace method instead of the monolithic setText. In the end it probably boils down to me arguing that the setText logic currently in TextArea IMO should be in TextLayoutManager (and probably based on its data structures) because it is an operation closely coupled to layout and not to areas. Regards Luca BTW, it would also be really nice to have test cases for this new feature even if just expanding existing test cases to test for the new areas created. It would make catching regressions down the track much easier. Cheers Manuel
Re: svn commit: r328381 - in /xmlgraphics/fop/trunk/src/java/org/apache/fop: area/inline/ layoutmgr/inline/ render/ render/pdf/ render/xml/
On 26.10.2005 13:05:26 Manuel Mall wrote: snip/ There is no need to expose creation of the Space/Word areas directly to TextLayoutManager either. TextArea could easily expose an addWord and an addSpace method instead of the monolithic setText. In the end it probably boils down to me arguing that the setText logic currently in TextArea IMO should be in TextLayoutManager (and probably based on its data structures) because it is an operation closely coupled to layout and not to areas. FWIW, I agree with Manuel that the new logic in TextArea shouldn't be there. The area tree should simply be a data structure, nothing more. Splitting functionality in too many places is dangerous. Regards Luca BTW, it would also be really nice to have test cases for this new feature even if just expanding existing test cases to test for the new areas created. It would make catching regressions down the track much easier. +1 to that. The test cases are very important to document what we can do besides checking for regressions. I know this is additional work, especially hard for those who have very little time available to them, but the tests are something that is extremely valuable to improve the quality of our package. Jeremias Maerki
Re: svn commit: r328381 - in /xmlgraphics/fop/trunk/src/java/org/apache/fop: area/inline/ layoutmgr/inline/ render/ render/pdf/ render/xml/
On Tue, 25 Oct 2005 11:14 pm, [EMAIL PROTECTED] wrote: Author: lfurini Date: Tue Oct 25 08:14:10 2005 New Revision: 328381 URL: http://svn.apache.org/viewcvs?rev=328381view=rev Log: Fix for bug 36238 (at least in the PDFRenderer) The text is split into WordAreas and SpaceAreas: the latters (when the font is multibyte) are not affected by the Tw operator, so they are shifted in the inline progression dimension instead. The behaviour of the other renderers should be the same as before this change. Luca, I have a question on this. You break in TextArea the text into words based on CharUtilities.isAnySpace. Is this guaranteed to be consistent with the breaking and adjustment calculations in TextLayoutManager? I am concerned we may be using different rules for word breaking in different places. Somehow it doesn't feel right to me that TextLayoutManager does all the breaking and calculations and then we give the whole chunk to TextArea and it breaks it again using a possibly different algorithm but still using the adjustment value calculated by TextLayoutManager. May be I am too picky here or misunderstand something - sorry. Cheers Manuel