Re: svn commit: r328381 - in /xmlgraphics/fop/trunk/src/java/org/apache/fop: area/inline/ layoutmgr/inline/ render/ render/pdf/ render/xml/

2005-10-28 Thread Manuel Mall
On Fri, 28 Oct 2005 06:06 pm, Luca Furini wrote:
 Manuel Mall wrote:
   I added a boolean attribute in SpaceArea that is true for
   adjustable spaces (at the moment it is not used, but I will fix
   it soon).
 
  Why would you envisage this is required by the renderers?

 I can only speak of the PDFRenderer, as I don't know the other
 formats very well.

 Remember that this started from the bug 36238: in the pdf format, the
 multi-byte normal space character is not affected by the Tw
 operator (that sets the word adjustment) so we needed a way to apply
 this adjustment in another manner. The pdf format allow to specify an
 horizontal offset between fragments of text, so that their distance
 can be increased or reduced: if the font is multy-byte, using this
 feature we can adjust the spaces.

 But we need to know which spaces can be adjusted, and which cannot.
 If we don't wont to duplicate the logic for the space recognition,
 the SpaceAreas must simply have a boolean value stating whether the
 space is adjustable, so that the renderers won't need to look at the
 space and decide.

I don't get that point. Isn't it enough for the renderer to know the 
offset for the area in question? What additional decisions would the 
renderer make based on the adjust flag? Or do you mean we still have 
the twsAdjust on the TextArea and the offset is only relative to 
twsAdjust? Do we really gain anything with that instead of making the 
offset the corrected twsAdjust value?


   At the moment the offset in SpaceArea and WordArea are unused,
   but this is how I think they could be used: if, because of the
   rounding in the adjustment computation, the applied adjustment is
   different than the needed one, the TextLM should distribute this
   difference (a few millipoints) among the SpaceAreas and / or
   WordAreas, setting their offset.
 
  I see. Not sure if offset is the correct attribute though. Its
  stated purpose is for offsetting in the bpd direction. Overloading
  it with a different meaning is questionable.

 Maybe offset is not the most appropriate name, even if the pdf
 specification calls it that way (last line in the last row of table
 5.6); at least it is can be misleading in the context of fop.

  Instead for SpaceAreas we could use the IPD trait. Basically the LM
  tells the renderers via the SpaceArea 'I want an ipd mpt wide
  space here'. How the renderers make this space happen is up to
  them. Not sure if we should even consider distributing the rounding
  errors to the WordAreas. I would leave them with the SpaceAreas.

 At the moment neither the WordAreas nor the SpaceAreas have the ipd
 explicitly set, so it is 0 (the ipd is set for the parent TextArea):
 these kind of area were initially thought of as light weight areas,
 in order to increase the size of the area tree as little as possible,
 and we could even define them in a minimal fashion, so that they have
 no unused inherited attributes and methods.
As you said a saving would only be achieved if you actually change the 
Area class hierarchy such that those fields are not carried by every 
instance. Otherwise setting ipd to 0 or anything else is not saving a 
single bit.


 Setting the ipd for each word and each space is another approach: it
 would give the full control on the text positioning to fop's layout
 enginge, while at the moment we control the overall text using the
 adjustment parameters.
Isn't that the same if you set an offset on each of those, LM takes 
control and tells the renderers the detail of the positioning for each 
fragment including spacing?


 The drawback of this increased control is that the renderers would
 need to be more complex in order to handle the extra information.

  A different approach would be not to have the space areas at all
  and simply set space-start/space-end traits on the word areas but
  that may be a bit radical (in the sense of its impact on the
  renderers) at this stage (but see below - CJK).

 [1] My only concern is: if we show spaces but don't insert space
 characters, how would the text extracted from such an output look
 like? Wouldn'titbesomethinglikethis? :-)

Good point - we probably need the space char in the pdf otherwise we 
would have some unhappy users later.


  Of course we could also use space-start/space-end traits on the
  SpaceAreas to model shrink/stretch instead of the IPD trait. May be
  that would be preferable?

 Well, this is not very different from what is now (improperly) called
 offset :-)
Correct, it is logically the same.


   The renderers will use this according to their own adjustment
   rule: for example the PDFRenderer would add it to the text
   adjustment if the character is multibyte.
  
   The offset could come in handy for the cjk support (bug 36977):
   in this case there are no adjustable spaces, and if text is
   justified all the difference between line width and unadjusted
   character width could be handled modifying the offsets of some
   

Re: svn commit: r328381 - in /xmlgraphics/fop/trunk/src/java/org/apache/fop: area/inline/ layoutmgr/inline/ render/ render/pdf/ render/xml/

2005-10-28 Thread Luca Furini

Manuel Mall wrote:


 But we need to know which spaces can be adjusted, and which cannot.
 If we don't wont to duplicate the logic for the space recognition,
 the SpaceAreas must simply have a boolean value stating whether the
 space is adjustable, so that the renderers won't need to look at the
 space and decide.

I don't get that point. Isn't it enough for the renderer to know the 
offset for the area in question? What additional decisions would the 
renderer make based on the adjust flag? Or do you mean we still have 
the twsAdjust on the TextArea and the offset is only relative to 
twsAdjust? Do we really gain anything with that instead of making the 
offset the corrected twsAdjust value?


At the moment we still use the twsAdjust value, and the individual offset 
would be an additional adjustment. Maybe there is little gain, but when 
the font is not multi-byte this saves us from setting the offset on each 
adjustable SpaceArea and using it in the renderer. It's not much, both in 
terms of time and output length: but if there is an easy way to adjust all 
the spaces at once ... why should we do another way? :-)



[...]

 So, what if we rename offset - spaceAfter? It seems to me that we
 are here speaking of the same thing using two different names. :-)

Fair enough, I agree we do.


Good!
We just have to reach an agreement on this last detail, and I'll implement 
the changes.


Regards
Luca


Re: svn commit: r328381 - in /xmlgraphics/fop/trunk/src/java/org/apache/fop: area/inline/ layoutmgr/inline/ render/ render/pdf/ render/xml/

2005-10-27 Thread Luca Furini

I wrote:


Manuel Mall wrote:

 There is no need to expose creation of the Space/Word areas directly 
 to TextLayoutManager either. TextArea could easily expose an addWord 
 and an addSpace method instead of the monolithic setText. In the end 
 it probably boils down to me arguing that the setText logic currently 
 in TextArea IMO should be in TextLayoutManager (and probably based on 
 its data structures) because it is an operation closely coupled to 
 layout and not to areas.


Ok.


Done:
  http://svn.apache.org/viewcvs.cgi?view=revrev=328882

I added a boolean attribute in SpaceArea that is true for adjustable 
spaces (at the moment it is not used, but I will fix it soon).


At the moment the offset in SpaceArea and WordArea are unused, but this is 
how I think they could be used: if, because of the rounding in the 
adjustment computation, the applied adjustment is different than the 
needed one, the TextLM should distribute this difference (a few 
millipoints) among the SpaceAreas and / or WordAreas, setting their 
offset.


The renderers will use this according to their own adjustment rule: for 
example the PDFRenderer would add it to the text adjustment if the 
character is multibyte.


The offset could come in handy for the cjk support (bug 36977): in this 
case there are no adjustable spaces, and if text is justified all the 
difference between line width and unadjusted character width could be 
handled modifying the offsets of some special characters.


Regards
Luca




Re: svn commit: r328381 - in /xmlgraphics/fop/trunk/src/java/org/apache/fop: area/inline/ layoutmgr/inline/ render/ render/pdf/ render/xml/

2005-10-26 Thread Luca Furini

Manuel Mall wrote:

I have a question on this. You break in TextArea the text into words 
based on CharUtilities.isAnySpace. Is this guaranteed to be consistent 
with the breaking and adjustment calculations in TextLayoutManager? I am 
concerned we may be using different rules for word breaking in different 
places.


As far as consistency is concerned, I agree with you: the handling of the 
different kinds of spaces (breaking, non-breaking, fixed width, ...) is 
still quite incomplete and dispersed over different classes. Just to add 
another example, the CharacterLM implicitly expects its character to be 
a non-space character and has its own lines of code concerning the 
creation of the elements, while it could share the methods already called 
by the TextLM.


Having a single, centralized class taking care of the breaking (be it a 
Java utility class or a Fop one) and a single, shared method implementing 
the creation of the elements would surely increase consistency and 
clarity.


Somehow it doesn't feel right to me that TextLayoutManager does all the 
breaking and calculations and then we give the whole chunk to TextArea 
and it breaks it again using a possibly different algorithm but still 
using the adjustment value calculated by TextLayoutManager.


When I was trying to fix bug 36238 I initially started modifying 
TextLM#createTextArea(), using the AreaInfo objects to create WordAreas 
and SpaceAreas, but I then decided to move the string splitting inside 
TextArea because:


1) if WordAreas and SpaceAreas are not directly created by the LMs, there 
is no need to change a single line of code inside the classes creating 
TextAreas; this is not a real reason supporting the choice, just an 
handy consequence of it;


2) if TextArea still provides a getText() method, the renderers are not 
forced to render the text word by word and space by space if their word 
spacing treatment is not affected by multi-byte characters; but once 
again, this is not a real reason as we could provide this method anyway;


3) although both SpaceArea and WordArea hava an offset attribute it is 
ATM not used, so these areas does not carry any formatting information; 
their only purpose is to highlight spaces, thus allowing some specific 
renderer to handle them correctly regardless of their encoding; in other 
words, we are not losing braking and calculations, we simply do not need 
them anymore as we already know exactly which text will be placed in each 
line, and how wide it will be once it's correctly adjusted;


4) the text that will be placed in a line cannot be directly taken from 
textArray (in the TextLM), and the string str should be used instead 
anyway, as it may be different from the concatenation of the single pieces 
of text; at the moment the only difference concerns the hyphenation 
character - added at the end of the line, but I suspect that in 
different languages there could be other differences; so, we cannot simply 
create a WordAreas for each AreaInfo object.


So, if you find it strange to break the text, put it together and split it 
again, me too! :-) But this initial feeling disappeared when I realized 
that the final splitting does not involve breaking in its proper sense, 
but just classification of characters.


This is why I did what I did; if I did not manage to convince you ... you 
can try and convince me! :-)


Regards
Luca




Re: svn commit: r328381 - in /xmlgraphics/fop/trunk/src/java/org/apache/fop: area/inline/ layoutmgr/inline/ render/ render/pdf/ render/xml/

2005-10-26 Thread Manuel Mall
On Wed, 26 Oct 2005 05:15 pm, Luca Furini wrote:
 Manuel Mall wrote:
  I have a question on this. You break in TextArea the text into
  words based on CharUtilities.isAnySpace. Is this guaranteed to be
  consistent with the breaking and adjustment calculations in
  TextLayoutManager? I am concerned we may be using different rules
  for word breaking in different places.

 As far as consistency is concerned, I agree with you: the handling of
 the different kinds of spaces (breaking, non-breaking, fixed width,
 ...) is still quite incomplete and dispersed over different
 classes. Just to add another example, the CharacterLM implicitly
 expects its character to be a non-space character and has its own
 lines of code concerning the creation of the elements, while it could
 share the methods already called by the TextLM.

 Having a single, centralized class taking care of the breaking (be it
 a Java utility class or a Fop one) and a single, shared method
 implementing the creation of the elements would surely increase
 consistency and clarity.

  Somehow it doesn't feel right to me that TextLayoutManager does all
  the breaking and calculations and then we give the whole chunk to
  TextArea and it breaks it again using a possibly different
  algorithm but still using the adjustment value calculated by
  TextLayoutManager.

 When I was trying to fix bug 36238 I initially started modifying
 TextLM#createTextArea(), using the AreaInfo objects to create
 WordAreas and SpaceAreas, but I then decided to move the string
 splitting inside TextArea because:

 1) if WordAreas and SpaceAreas are not directly created by the LMs,
 there is no need to change a single line of code inside the classes
 creating TextAreas; this is not a real reason supporting the
 choice, just an handy consequence of it;

 2) if TextArea still provides a getText() method, the renderers are
 not forced to render the text word by word and space by space if
 their word spacing treatment is not affected by multi-byte
 characters; but once again, this is not a real reason as we could
 provide this method anyway;

 3) although both SpaceArea and WordArea hava an offset attribute it
 is ATM not used, so these areas does not carry any formatting
 information; their only purpose is to highlight spaces, thus
 allowing some specific renderer to handle them correctly regardless
 of their encoding; in other words, we are not losing braking and
 calculations, we simply do not need them anymore as we already know
 exactly which text will be placed in each line, and how wide it will
 be once it's correctly adjusted;

 4) the text that will be placed in a line cannot be directly taken
 from textArray (in the TextLM), and the string str should be used
 instead anyway, as it may be different from the concatenation of the
 single pieces of text; at the moment the only difference concerns the
 hyphenation character - added at the end of the line, but I suspect
 that in different languages there could be other differences; so, we
 cannot simply create a WordAreas for each AreaInfo object.

 So, if you find it strange to break the text, put it together and
 split it again, me too! :-) But this initial feeling disappeared when
 I realized that the final splitting does not involve breaking in
 its proper sense, but just classification of characters.

 This is why I did what I did; if I did not manage to convince you ...
 you can try and convince me! :-)

I must admit you haven't convinced me. The basic premise still is 
TextLayoutManager does all the calculations including determining the 
number of word spaces and the resulting adjustment, that means it must 
know where the word spaces are. Why should TextArea recalculate the 
positions (and wrong as well because isAnySpace() tests for 7 different 
UNICODE space values not all of them adjustables spaces while 
TextLayoutManager uses a much smaller set to calculate the adjustment 
values)?

There is no need to expose creation of the Space/Word areas directly to 
TextLayoutManager either. TextArea could easily expose an addWord and 
an addSpace method instead of the monolithic setText. In the end it 
probably boils down to me arguing that the setText logic currently in 
TextArea IMO should be in TextLayoutManager (and probably based on its 
data structures) because it is an operation closely coupled to layout 
and not to areas.

 Regards
  Luca

BTW, it would also be really nice to have test cases for this new 
feature even if just expanding existing test cases to test for the new 
areas created. It would make catching regressions down the track much 
easier.

Cheers

Manuel


Re: svn commit: r328381 - in /xmlgraphics/fop/trunk/src/java/org/apache/fop: area/inline/ layoutmgr/inline/ render/ render/pdf/ render/xml/

2005-10-26 Thread Jeremias Maerki

On 26.10.2005 13:05:26 Manuel Mall wrote:
snip/
 There is no need to expose creation of the Space/Word areas directly to 
 TextLayoutManager either. TextArea could easily expose an addWord and 
 an addSpace method instead of the monolithic setText. In the end it 
 probably boils down to me arguing that the setText logic currently in 
 TextArea IMO should be in TextLayoutManager (and probably based on its 
 data structures) because it is an operation closely coupled to layout 
 and not to areas.

FWIW, I agree with Manuel that the new logic in TextArea shouldn't be
there. The area tree should simply be a data structure, nothing more.
Splitting functionality in too many places is dangerous.

  Regards
   Luca
 
 BTW, it would also be really nice to have test cases for this new 
 feature even if just expanding existing test cases to test for the new 
 areas created. It would make catching regressions down the track much 
 easier.

+1 to that. The test cases are very important to document what we can do
besides checking for regressions. I know this is additional work,
especially hard for those who have very little time available to them,
but the tests are something that is extremely valuable to improve the
quality of our package.

Jeremias Maerki



Re: svn commit: r328381 - in /xmlgraphics/fop/trunk/src/java/org/apache/fop: area/inline/ layoutmgr/inline/ render/ render/pdf/ render/xml/

2005-10-25 Thread Manuel Mall
On Tue, 25 Oct 2005 11:14 pm, [EMAIL PROTECTED] wrote:
 Author: lfurini
 Date: Tue Oct 25 08:14:10 2005
 New Revision: 328381

 URL: http://svn.apache.org/viewcvs?rev=328381view=rev
 Log:
 Fix for bug 36238 (at least in the PDFRenderer)

 The text is split into WordAreas and SpaceAreas: the latters (when
 the font is multibyte) are not affected by the Tw operator, so they
 are shifted in the inline progression dimension instead.

 The behaviour of the other renderers should be the same as before
 this change.

Luca,

I have a question on this. You break in TextArea the text into words 
based on CharUtilities.isAnySpace. Is this guaranteed to be consistent 
with the breaking and adjustment calculations in TextLayoutManager? I 
am concerned we may be using different rules for word breaking in 
different places. Somehow it doesn't feel right to me that 
TextLayoutManager does all the breaking and calculations and then we 
give the whole chunk to TextArea and it breaks it again using a 
possibly different algorithm but still using the adjustment value 
calculated by TextLayoutManager.

May be I am too picky here or misunderstand something - sorry.

Cheers

Manuel