Re: Suggestion: refactoring property access in the FO tree?

2007-04-27 Thread Vincent Hennebert
Hi Richard,

Thanks for your contribution. I'll try to have a look ASAP but I'm a bit
in a hurry currently. Stay tuned.

Vincent


richardw geoquip-rnd demon co uk a écrit :
 Vincent Hennebert writes:
   I fully agree. Good design should not be sacrificed for efficiency.
   Anyway only the profiling of a FOP run would give us proper indications
   of what is actually eating time and memory, and where we should start
   optimizing.
 
 
 If you apply the the recent patches here:
http://issues.apache.org/bugzilla/show_bug.cgi?id=41656
 and here:
http://issues.apache.org/bugzilla/show_bug.cgi?id=41044
 then process a big silly 40MB plus fop table and wait for it to
 bomb out with an OOM you wind up with a set of instance counts
 as shown:
 
 134407 instances of class org.apache.fop.fo.FONode[]
 106438 instances of class char[]
 100833 instances of class org.apache.fop.fo.FOText
 100805 instances of class org.apache.fop.fo.flow.Block
 100803 instances of class org.apache.fop.fo.flow.TableCell
 33601 instances of class org.apache.fop.fo.flow.TableRow
 5618 instances of class java.lang.String
 5047 instances of class java.lang.Integer
 4316 instances of class java.util.HashMap$Entry
 2620 instances of class org.apache.fop.fo.FObj$FObjIterator
 1475 instances of class java.lang.Class
 
 Richard
 


Re: Suggestion: refactoring property access in the FO tree?

2007-04-26 Thread richardw
Vincent Hennebert writes:
  I fully agree. Good design should not be sacrificed for efficiency.
  Anyway only the profiling of a FOP run would give us proper indications
  of what is actually eating time and memory, and where we should start
  optimizing.


If you apply the the recent patches here:
   http://issues.apache.org/bugzilla/show_bug.cgi?id=41656
and here:
   http://issues.apache.org/bugzilla/show_bug.cgi?id=41044
then process a big silly 40MB plus fop table and wait for it to
bomb out with an OOM you wind up with a set of instance counts
as shown:

134407 instances of class org.apache.fop.fo.FONode[]
106438 instances of class char[]
100833 instances of class org.apache.fop.fo.FOText
100805 instances of class org.apache.fop.fo.flow.Block
100803 instances of class org.apache.fop.fo.flow.TableCell
33601 instances of class org.apache.fop.fo.flow.TableRow
5618 instances of class java.lang.String
5047 instances of class java.lang.Integer
4316 instances of class java.util.HashMap$Entry
2620 instances of class org.apache.fop.fo.FObj$FObjIterator
1475 instances of class java.lang.Class

Richard



Re: Suggestion: refactoring property access in the FO tree?

2007-04-02 Thread Vincent Hennebert
Hi Andreas,

Andreas L Delmelle a écrit :
 On Mar 30, 2007, at 11:21, Vincent Hennebert wrote:
snip/
 In ancient times, each FO had a full PropertyList, so the properties
 could be queried via a generic get(PROPERTY_ID) accessor that was simply
 a proxy to the PropertyList's corresponding get(). This was, however, a
 much less efficient approach than what we currently have.

 To be sure I understand: each object had the very same list of
 properties, with null values for the properties wich were not applicable
 to it?
 
 Errm... yes, if what you mean by 'the very same' is 'a different
 instance of the very same type'.

Yes that's what I meant. Am I imprecise sometimes. And I dare
complaining about the Rec's uncertainties ;-)


 Each FObj instance had its very own instance of the same PropertyList type.
 
 And the loss of efficiency was due to the indirection caused by
 the generic get(PROPERTY_ID) I guess?
 
 The biggest inefficiency was the space that these lists consume. They
 allocate space for an array with a number of elements equal to the
 number of /possible/ properties, effectively wasting space in case of
 FOs to which only a handful properties apply. On top of that, this space
 was not reclaimed until very late in the process, whereas now, the
 PropertyLists and their backing arrays in most cases can be
 garbage-collected right after endElement().
 
 A FO to which only three properties apply will currently have only three
 instance variables corresponding to those properties. In the old
 architecture, it would have one PropertyList member, containing an array
 that could store some 250 Property instances, but only three elements
 were actually in use...

Not that I want to go back to that situation, but a Map instead of an
array would probably have been acceptable? More memory-efficient without
being much more CPU-intensive I think.


 The inefficiency caused by the indirection I would consider to be a
 small but necessary price to pay for an increased amount of flexibility
 and extensibility.

I fully agree. Good design should not be sacrificed for efficiency.
Anyway only the profiling of a FOP run would give us proper indications
of what is actually eating time and memory, and where we should start
optimizing.


 snip /
 Each of the particular FO classes can then define its own
 implementation, which stores the applicable properties and maybe, for
 some FOs (like FOText!), this implementation can simply search the
 ancestors, instead of having to allocate space for properties that are
 always identical and can't be specified anyway...

 Hmmm. My understanding of the whole thing is still a bit vague, but
 wouldn't that lead to the same code replication as we have now?
 I'm wondering if it's not possible to define a restricted number of
 implementations of this interface, each applying to a whole set of
 similar FOs.
 
 Good point!
 
 Or use object composition: there would be objects dealing
 with a particular set of properties (say, border/padddin/background),
 and each FO for which that set applies would be composed of the
 corresponding object.

 Perhaps the flyweight pattern could apply here: only one object for each
 set of properties, initializing the correct fields in the FObj. With
 some means to automagically wire everything by using marker interfaces
 or whatever.
 
 Also fine suggestions. I'm going to chew on these a bit more.
 
 As I said my ideas are still all pretty vague... Also, does anyone have
 knowledge about aspect programming? I've the feeling that could apply
 quite well here.
 
 Just been reading up on AOSD, and it does indeed seem to be fitting in
 here.
 
 The downside would be the loss in convenience, for instance, where we
 now have individual accessors returning a Length, LengthRange,
 Numeric... Not sure how I would address this, yet. :/

 I'm not sure of what you mean? The same property can be accessed in
 different ways?
 
 What I mean is that right now, the bind() method already takes care of
 getting the right type of Property.
 pList.get(PROPERTY_ID).getLengthRange() for instance...
 
 An accessor getInlineProgressionDimension(), for example, returns a
 LengthRange (not a Property).

Ok, I see your point. Well then the specific implementations I was
talking about earlier could probably handle that. For example almost
every formatting object can have border/padding/background. We would
have a specific class in charge of getting the values of these
properties set on the object (or retrieving inherited values), and
converting them into the right types (e.g., LengthRange). Each
applicable object would be given an instance of such a class at its
creation. When needed it would call the specific methods from this class
(getBorderBefore, etc.). Something like that...


Cheers,
Vincent


Re: Suggestion: refactoring property access in the FO tree?

2007-03-30 Thread Vincent Hennebert
Hi Andreas,

I have little knowledge of the FO tree construction, so I'll perhaps
make naive questions and remarks. I write them in the hope they will
trigger further thoughts.

Andreas L Delmelle a écrit :
 
 Hi,
 
 Just an idea that popped in my head. I was thinking of the access to a
 FONode's properties from the layoutengine, and it became apparent to me
 that ATM the approach is not too flexible: each subclass has its own
 little set of private instance methods and public accessors. This makes
 it hard to implement new properties or deal with extension properties in
 a generic way.

Also, that causes the same code to be replicated in every class; for
example each object for which the margin properties apply will have (1)
a CommonMarginBlock field; (2) a commonMarginBlock =
pList.getMarginBlockProps(); line in the bind method. This replication
of code is unfortunate as it artificially increases the codebase size,
is more error-prone and more difficult to maintain.


 In ancient times, each FO had a full PropertyList, so the properties
 could be queried via a generic get(PROPERTY_ID) accessor that was simply
 a proxy to the PropertyList's corresponding get(). This was, however, a
 much less efficient approach than what we currently have.

To be sure I understand: each object had the very same list of
properties, with null values for the properties wich were not applicable
to it? And the loss of efficiency was due to the indirection caused by
the generic get(PROPERTY_ID) I guess?


 My suggested best-of-both-worlds would be the addition of (at least) one
 new type, which would store the applicable properties for a FO.
 
 Starting with an interface:
 
 public interface FOProperties {
 Property get(int);
 Property get(String);
 }
 
 Each of the particular FO classes can then define its own
 implementation, which stores the applicable properties and maybe, for
 some FOs (like FOText!), this implementation can simply search the
 ancestors, instead of having to allocate space for properties that are
 always identical and can't be specified anyway...

Hmmm. My understanding of the whole thing is still a bit vague, but
wouldn't that lead to the same code replication as we have now? I'm
wondering if it's not possible to define a restricted number of
implementations of this interface, each applying to a whole set of
similar FOs. Or use object composition: there would be objects dealing
with a particular set of properties (say, border/padddin/background),
and each FO for which that set applies would be composed of the
corresponding object.

Perhaps the flyweight pattern could apply here: only one object for each
set of properties, initializing the correct fields in the FObj. With
some means to automagically wire everything by using marker interfaces
or whatever.

As I said my ideas are still all pretty vague... Also, does anyone have
knowledge about aspect programming? I've the feeling that could apply
quite well here.


 The downside would be the loss in convenience, for instance, where we
 now have individual accessors returning a Length, LengthRange,
 Numeric... Not sure how I would address this, yet. :/

I'm not sure of what you mean? The same property can be accessed in
different ways?


 If anyone has suggestions, feedback is welcome.

I hope what I wrote makes sense :-\


Cheers,
Vincent


Re: Suggestion: refactoring property access in the FO tree?

2007-03-30 Thread Andreas L Delmelle

On Mar 30, 2007, at 11:21, Vincent Hennebert wrote:

Hi Vincent,


I have little knowledge of the FO tree construction, so I'll perhaps
make naive questions and remarks. I write them in the hope they will
trigger further thoughts.


That was the general idea. :-)
Besides, people who are not all too familiar yet with that part of  
the code are sometimes more likely to offer helpful hints (as their  
thinking has not yet been contaminated ;-))



snip /
Also, that causes the same code to be replicated in every class; for
example each object for which the margin properties apply will have  
(1)

a CommonMarginBlock field; (2) a commonMarginBlock =
pList.getMarginBlockProps(); line in the bind method. This  
replication

of code is unfortunate as it artificially increases the codebase size,
is more error-prone and more difficult to maintain.


Indeed! Yet another reason pro refactoring in that area.


In ancient times, each FO had a full PropertyList, so the properties
could be queried via a generic get(PROPERTY_ID) accessor that was  
simply
a proxy to the PropertyList's corresponding get(). This was,  
however, a

much less efficient approach than what we currently have.


To be sure I understand: each object had the very same list of
properties, with null values for the properties wich were not  
applicable

to it?


Errm... yes, if what you mean by 'the very same' is 'a different  
instance of the very same type'.
Each FObj instance had its very own instance of the same PropertyList  
type.



And the loss of efficiency was due to the indirection caused by
the generic get(PROPERTY_ID) I guess?


The biggest inefficiency was the space that these lists consume. They  
allocate space for an array with a number of elements equal to the  
number of /possible/ properties, effectively wasting space in case of  
FOs to which only a handful properties apply. On top of that, this  
space was not reclaimed until very late in the process, whereas now,  
the PropertyLists and their backing arrays in most cases can be  
garbage-collected right after endElement().


A FO to which only three properties apply will currently have only  
three instance variables corresponding to those properties. In the  
old architecture, it would have one PropertyList member, containing  
an array that could store some 250 Property instances, but only three  
elements were actually in use...


The inefficiency caused by the indirection I would consider to be a  
small but necessary price to pay for an increased amount of  
flexibility and extensibility.



snip /
Each of the particular FO classes can then define its own
implementation, which stores the applicable properties and maybe, for
some FOs (like FOText!), this implementation can simply search the
ancestors, instead of having to allocate space for properties that  
are

always identical and can't be specified anyway...


Hmmm. My understanding of the whole thing is still a bit vague, but
wouldn't that lead to the same code replication as we have now?
I'm wondering if it's not possible to define a restricted number of
implementations of this interface, each applying to a whole set of
similar FOs.


Good point!


Or use object composition: there would be objects dealing
with a particular set of properties (say, border/padddin/background),
and each FO for which that set applies would be composed of the
corresponding object.

Perhaps the flyweight pattern could apply here: only one object for  
each

set of properties, initializing the correct fields in the FObj. With
some means to automagically wire everything by using marker interfaces
or whatever.


Also fine suggestions. I'm going to chew on these a bit more.

As I said my ideas are still all pretty vague... Also, does anyone  
have

knowledge about aspect programming? I've the feeling that could apply
quite well here.


Just been reading up on AOSD, and it does indeed seem to be fitting  
in here.



The downside would be the loss in convenience, for instance, where we
now have individual accessors returning a Length, LengthRange,
Numeric... Not sure how I would address this, yet. :/


I'm not sure of what you mean? The same property can be accessed in
different ways?


What I mean is that right now, the bind() method already takes care  
of getting the right type of Property.

pList.get(PROPERTY_ID).getLengthRange() for instance...

An accessor getInlineProgressionDimension(), for example, returns a  
LengthRange (not a Property).


This treatment would obviously have to be moved or performed  
somewhere else, if a generic get() would always return an instance of  
type Property.



If anyone has suggestions, feedback is welcome.


I hope what I wrote makes sense :-\


It did.


Cheers,

Andreas