Re: Suggestion: refactoring property access in the FO tree?
Hi Richard, Thanks for your contribution. I'll try to have a look ASAP but I'm a bit in a hurry currently. Stay tuned. Vincent richardw geoquip-rnd demon co uk a écrit : Vincent Hennebert writes: I fully agree. Good design should not be sacrificed for efficiency. Anyway only the profiling of a FOP run would give us proper indications of what is actually eating time and memory, and where we should start optimizing. If you apply the the recent patches here: http://issues.apache.org/bugzilla/show_bug.cgi?id=41656 and here: http://issues.apache.org/bugzilla/show_bug.cgi?id=41044 then process a big silly 40MB plus fop table and wait for it to bomb out with an OOM you wind up with a set of instance counts as shown: 134407 instances of class org.apache.fop.fo.FONode[] 106438 instances of class char[] 100833 instances of class org.apache.fop.fo.FOText 100805 instances of class org.apache.fop.fo.flow.Block 100803 instances of class org.apache.fop.fo.flow.TableCell 33601 instances of class org.apache.fop.fo.flow.TableRow 5618 instances of class java.lang.String 5047 instances of class java.lang.Integer 4316 instances of class java.util.HashMap$Entry 2620 instances of class org.apache.fop.fo.FObj$FObjIterator 1475 instances of class java.lang.Class Richard
Re: Suggestion: refactoring property access in the FO tree?
Vincent Hennebert writes: I fully agree. Good design should not be sacrificed for efficiency. Anyway only the profiling of a FOP run would give us proper indications of what is actually eating time and memory, and where we should start optimizing. If you apply the the recent patches here: http://issues.apache.org/bugzilla/show_bug.cgi?id=41656 and here: http://issues.apache.org/bugzilla/show_bug.cgi?id=41044 then process a big silly 40MB plus fop table and wait for it to bomb out with an OOM you wind up with a set of instance counts as shown: 134407 instances of class org.apache.fop.fo.FONode[] 106438 instances of class char[] 100833 instances of class org.apache.fop.fo.FOText 100805 instances of class org.apache.fop.fo.flow.Block 100803 instances of class org.apache.fop.fo.flow.TableCell 33601 instances of class org.apache.fop.fo.flow.TableRow 5618 instances of class java.lang.String 5047 instances of class java.lang.Integer 4316 instances of class java.util.HashMap$Entry 2620 instances of class org.apache.fop.fo.FObj$FObjIterator 1475 instances of class java.lang.Class Richard
Re: Suggestion: refactoring property access in the FO tree?
Hi Andreas, Andreas L Delmelle a écrit : On Mar 30, 2007, at 11:21, Vincent Hennebert wrote: snip/ In ancient times, each FO had a full PropertyList, so the properties could be queried via a generic get(PROPERTY_ID) accessor that was simply a proxy to the PropertyList's corresponding get(). This was, however, a much less efficient approach than what we currently have. To be sure I understand: each object had the very same list of properties, with null values for the properties wich were not applicable to it? Errm... yes, if what you mean by 'the very same' is 'a different instance of the very same type'. Yes that's what I meant. Am I imprecise sometimes. And I dare complaining about the Rec's uncertainties ;-) Each FObj instance had its very own instance of the same PropertyList type. And the loss of efficiency was due to the indirection caused by the generic get(PROPERTY_ID) I guess? The biggest inefficiency was the space that these lists consume. They allocate space for an array with a number of elements equal to the number of /possible/ properties, effectively wasting space in case of FOs to which only a handful properties apply. On top of that, this space was not reclaimed until very late in the process, whereas now, the PropertyLists and their backing arrays in most cases can be garbage-collected right after endElement(). A FO to which only three properties apply will currently have only three instance variables corresponding to those properties. In the old architecture, it would have one PropertyList member, containing an array that could store some 250 Property instances, but only three elements were actually in use... Not that I want to go back to that situation, but a Map instead of an array would probably have been acceptable? More memory-efficient without being much more CPU-intensive I think. The inefficiency caused by the indirection I would consider to be a small but necessary price to pay for an increased amount of flexibility and extensibility. I fully agree. Good design should not be sacrificed for efficiency. Anyway only the profiling of a FOP run would give us proper indications of what is actually eating time and memory, and where we should start optimizing. snip / Each of the particular FO classes can then define its own implementation, which stores the applicable properties and maybe, for some FOs (like FOText!), this implementation can simply search the ancestors, instead of having to allocate space for properties that are always identical and can't be specified anyway... Hmmm. My understanding of the whole thing is still a bit vague, but wouldn't that lead to the same code replication as we have now? I'm wondering if it's not possible to define a restricted number of implementations of this interface, each applying to a whole set of similar FOs. Good point! Or use object composition: there would be objects dealing with a particular set of properties (say, border/padddin/background), and each FO for which that set applies would be composed of the corresponding object. Perhaps the flyweight pattern could apply here: only one object for each set of properties, initializing the correct fields in the FObj. With some means to automagically wire everything by using marker interfaces or whatever. Also fine suggestions. I'm going to chew on these a bit more. As I said my ideas are still all pretty vague... Also, does anyone have knowledge about aspect programming? I've the feeling that could apply quite well here. Just been reading up on AOSD, and it does indeed seem to be fitting in here. The downside would be the loss in convenience, for instance, where we now have individual accessors returning a Length, LengthRange, Numeric... Not sure how I would address this, yet. :/ I'm not sure of what you mean? The same property can be accessed in different ways? What I mean is that right now, the bind() method already takes care of getting the right type of Property. pList.get(PROPERTY_ID).getLengthRange() for instance... An accessor getInlineProgressionDimension(), for example, returns a LengthRange (not a Property). Ok, I see your point. Well then the specific implementations I was talking about earlier could probably handle that. For example almost every formatting object can have border/padding/background. We would have a specific class in charge of getting the values of these properties set on the object (or retrieving inherited values), and converting them into the right types (e.g., LengthRange). Each applicable object would be given an instance of such a class at its creation. When needed it would call the specific methods from this class (getBorderBefore, etc.). Something like that... Cheers, Vincent
Re: Suggestion: refactoring property access in the FO tree?
Hi Andreas, I have little knowledge of the FO tree construction, so I'll perhaps make naive questions and remarks. I write them in the hope they will trigger further thoughts. Andreas L Delmelle a écrit : Hi, Just an idea that popped in my head. I was thinking of the access to a FONode's properties from the layoutengine, and it became apparent to me that ATM the approach is not too flexible: each subclass has its own little set of private instance methods and public accessors. This makes it hard to implement new properties or deal with extension properties in a generic way. Also, that causes the same code to be replicated in every class; for example each object for which the margin properties apply will have (1) a CommonMarginBlock field; (2) a commonMarginBlock = pList.getMarginBlockProps(); line in the bind method. This replication of code is unfortunate as it artificially increases the codebase size, is more error-prone and more difficult to maintain. In ancient times, each FO had a full PropertyList, so the properties could be queried via a generic get(PROPERTY_ID) accessor that was simply a proxy to the PropertyList's corresponding get(). This was, however, a much less efficient approach than what we currently have. To be sure I understand: each object had the very same list of properties, with null values for the properties wich were not applicable to it? And the loss of efficiency was due to the indirection caused by the generic get(PROPERTY_ID) I guess? My suggested best-of-both-worlds would be the addition of (at least) one new type, which would store the applicable properties for a FO. Starting with an interface: public interface FOProperties { Property get(int); Property get(String); } Each of the particular FO classes can then define its own implementation, which stores the applicable properties and maybe, for some FOs (like FOText!), this implementation can simply search the ancestors, instead of having to allocate space for properties that are always identical and can't be specified anyway... Hmmm. My understanding of the whole thing is still a bit vague, but wouldn't that lead to the same code replication as we have now? I'm wondering if it's not possible to define a restricted number of implementations of this interface, each applying to a whole set of similar FOs. Or use object composition: there would be objects dealing with a particular set of properties (say, border/padddin/background), and each FO for which that set applies would be composed of the corresponding object. Perhaps the flyweight pattern could apply here: only one object for each set of properties, initializing the correct fields in the FObj. With some means to automagically wire everything by using marker interfaces or whatever. As I said my ideas are still all pretty vague... Also, does anyone have knowledge about aspect programming? I've the feeling that could apply quite well here. The downside would be the loss in convenience, for instance, where we now have individual accessors returning a Length, LengthRange, Numeric... Not sure how I would address this, yet. :/ I'm not sure of what you mean? The same property can be accessed in different ways? If anyone has suggestions, feedback is welcome. I hope what I wrote makes sense :-\ Cheers, Vincent
Re: Suggestion: refactoring property access in the FO tree?
On Mar 30, 2007, at 11:21, Vincent Hennebert wrote: Hi Vincent, I have little knowledge of the FO tree construction, so I'll perhaps make naive questions and remarks. I write them in the hope they will trigger further thoughts. That was the general idea. :-) Besides, people who are not all too familiar yet with that part of the code are sometimes more likely to offer helpful hints (as their thinking has not yet been contaminated ;-)) snip / Also, that causes the same code to be replicated in every class; for example each object for which the margin properties apply will have (1) a CommonMarginBlock field; (2) a commonMarginBlock = pList.getMarginBlockProps(); line in the bind method. This replication of code is unfortunate as it artificially increases the codebase size, is more error-prone and more difficult to maintain. Indeed! Yet another reason pro refactoring in that area. In ancient times, each FO had a full PropertyList, so the properties could be queried via a generic get(PROPERTY_ID) accessor that was simply a proxy to the PropertyList's corresponding get(). This was, however, a much less efficient approach than what we currently have. To be sure I understand: each object had the very same list of properties, with null values for the properties wich were not applicable to it? Errm... yes, if what you mean by 'the very same' is 'a different instance of the very same type'. Each FObj instance had its very own instance of the same PropertyList type. And the loss of efficiency was due to the indirection caused by the generic get(PROPERTY_ID) I guess? The biggest inefficiency was the space that these lists consume. They allocate space for an array with a number of elements equal to the number of /possible/ properties, effectively wasting space in case of FOs to which only a handful properties apply. On top of that, this space was not reclaimed until very late in the process, whereas now, the PropertyLists and their backing arrays in most cases can be garbage-collected right after endElement(). A FO to which only three properties apply will currently have only three instance variables corresponding to those properties. In the old architecture, it would have one PropertyList member, containing an array that could store some 250 Property instances, but only three elements were actually in use... The inefficiency caused by the indirection I would consider to be a small but necessary price to pay for an increased amount of flexibility and extensibility. snip / Each of the particular FO classes can then define its own implementation, which stores the applicable properties and maybe, for some FOs (like FOText!), this implementation can simply search the ancestors, instead of having to allocate space for properties that are always identical and can't be specified anyway... Hmmm. My understanding of the whole thing is still a bit vague, but wouldn't that lead to the same code replication as we have now? I'm wondering if it's not possible to define a restricted number of implementations of this interface, each applying to a whole set of similar FOs. Good point! Or use object composition: there would be objects dealing with a particular set of properties (say, border/padddin/background), and each FO for which that set applies would be composed of the corresponding object. Perhaps the flyweight pattern could apply here: only one object for each set of properties, initializing the correct fields in the FObj. With some means to automagically wire everything by using marker interfaces or whatever. Also fine suggestions. I'm going to chew on these a bit more. As I said my ideas are still all pretty vague... Also, does anyone have knowledge about aspect programming? I've the feeling that could apply quite well here. Just been reading up on AOSD, and it does indeed seem to be fitting in here. The downside would be the loss in convenience, for instance, where we now have individual accessors returning a Length, LengthRange, Numeric... Not sure how I would address this, yet. :/ I'm not sure of what you mean? The same property can be accessed in different ways? What I mean is that right now, the bind() method already takes care of getting the right type of Property. pList.get(PROPERTY_ID).getLengthRange() for instance... An accessor getInlineProgressionDimension(), for example, returns a LengthRange (not a Property). This treatment would obviously have to be moved or performed somewhere else, if a generic get() would always return an instance of type Property. If anyone has suggestions, feedback is welcome. I hope what I wrote makes sense :-\ It did. Cheers, Andreas