Re: Comments on the changes to the property subsystem
On Sun, Jan 25, 2004 at 11:38:14PM +0100, Finn Bock wrote: [Simon Pepping] I have just catched up with the massive changes to the property system. Allow me to share a few observations: Thanks for your comments. How do you otherwise think it compares to the previous generated property makers? I could not recall how it was done before the patch. My notes only describe FOElementMapping, and I have not searched back in CVS. I find the set of Makers as done in FOElementMapping quite baroque. The current FOPropertyMapping is quite concise and elegant. On the other hand, I am a supporter of generating code from a description of base data in an XML file, and so saw a good side to the old method as well. I have not studied it in detail though. 1. If I see correctly, PropertySets is not yet used. Correct. Its intended use is when we, at some point in the future, want to store more than just the specified properties in the FObjs. PropertyList is still a HashMap keyed on property name. Is this waiting for some other changes to be made? Yes. Either PropertyList should have a Property[] array indexed directly by the propId or the HashMap should be keyed by an Integer object with the same value as the propId: super.get(integerArray[propId]); where integerArray is initialized as: integerArray[1] = new Integer(1); integerArray[2] = new Integer(2); ... Which of these it will be depends on how much information we decide to store in the Fobj. Your patch mentions the sparse array PropertyList.values for this purpose. You made much effort to construct it. A propos, the loop in PropertySets.initialize is not easily understood; I needed the explanation you gave in an email to understand it. It would be good to add an explanation to the code. If it helps, this is how it documented it in my own notes: 'For each FO type the BitSet of allowed properties is merged with the BitSet of allowed properties of its possible direct children. When for any FO type the mergeContent subroutine modifies its BitSet, it sets the boolean variable 'loop' to true to signal that another iteration of the loop is required. By iterating over the loop until no further modifications are made, one makes sure that the merging process propagates from below to the top, that is, from any FO type to its farthest possible ancestor. This ensures that every FO type registers the allowed properties of itself and of all FO types that may ever appear in its subtree.' I think the name 'modified' or 'dirty' for the loop variable would be more descriptive. 2. In FOPropertyMapping the word 'generic' is used in two different senses: in s_generics and getGenericMappings() on the one hand, and in genericBoolean etc., createGenerics() and useGeneric() on the other hand. This may be confusing. One might e.g. be tempted to think that s_generics contains the objects genericXxx only, which is not the case. You are absolutely correct. I've took the names from the code which existed at the time. I'm also terrible at naming things so other suggestions will be welcome. I suggest that we rename s_generics and getGenericMappings to s_fomapping and getFoMappings because they deal with the properties from the xsl:fo spec. It seems indeed best to rename these two. OTOH, getGenericMappings is the only symbol that is used by other classes. But because it is probably used only once, in FObj, it is not a problem to rename it. s_generics is a mapping of propId to Property.Maker, so it should be called something like PropertyMappings, or PropertyMakers (what does s_ stand for?). FO is very general, it could equally be used for element mappings. 3. getGenericMappings rebuilds s_generics every time it is called. In the current code it seems to be called only once, by FObj when the class is loaded. Would it not be better if getGenericMappings itself would ensure that the array is built only once? Yes, but I would like to take the question a bit further and ask where such information should be cached? Storing it in static variables caches it in the classloader which makes it difficult to control the release of the memory. Perhaps it would be better to store the information in the Driver or FOUserAgent, IOW to put the caching under user control. If caching at the classloader level is the best way to do it, then it makes perfect sense to do what you suggest. I agree with Glen's reply. This is information which the application needs to hold only once, and which may persist. The user has nothing to do with this mapping. Current FOP interleaves FOtree building with area tree building and rendering; only after the last pagesequence FO node has been built could it be released (or does area tree building access the makers as well, for non-resolved properties?), which provides hardly any gain; in runs with several documents it is even a loss. I wonder why FObj makes
Re: Comments on the changes to the property subsystem
[Simon Pepping] I have just catched up with the massive changes to the property system. Allow me to share a few observations: Thanks for your comments. How do you otherwise think it compares to the previous generated property makers? 1. If I see correctly, PropertySets is not yet used. Correct. Its intended use is when we, at some point in the future, want to store more than just the specified properties in the FObjs. PropertyList is still a HashMap keyed on property name. Is this waiting for some other changes to be made? Yes. Either PropertyList should have a Property[] array indexed directly by the propId or the HashMap should be keyed by an Integer object with the same value as the propId: super.get(integerArray[propId]); where integerArray is initialized as: integerArray[1] = new Integer(1); integerArray[2] = new Integer(2); ... Which of these it will be depends on how much information we decide to store in the Fobj. 2. In FOPropertyMapping the word 'generic' is used in two different senses: in s_generics and getGenericMappings() on the one hand, and in genericBoolean etc., createGenerics() and useGeneric() on the other hand. This may be confusing. One might e.g. be tempted to think that s_generics contains the objects genericXxx only, which is not the case. You are absolutely correct. I've took the names from the code which existed at the time. I'm also terrible at naming things so other suggestions will be welcome. I suggest that we rename s_generics and getGenericMappings to s_fomapping and getFoMappings because they deal with the properties from the xsl:fo spec. 3. getGenericMappings rebuilds s_generics every time it is called. In the current code it seems to be called only once, by FObj when the class is loaded. Would it not be better if getGenericMappings itself would ensure that the array is built only once? Yes, but I would like to take the question a bit further and ask where such information should be cached? Storing it in static variables caches it in the classloader which makes it difficult to control the release of the memory. Perhaps it would be better to store the information in the Driver or FOUserAgent, IOW to put the caching under user control. If caching at the classloader level is the best way to do it, then it makes perfect sense to do what you suggest. regards, finn
Re: Comments on the changes to the property subsystem
--- Finn Bock [EMAIL PROTECTED] wrote: Yes, but I would like to take the question a bit further and ask where such information should be cached? Storing it in static variables caches it in the classloader which makes it difficult to control the release of the memory. Hmmm...where would this matter? Single-use command line would still be fine (instantly release after running.) As for servlet use, indeed it wouldn't be released after every run of a document--but I don't think you would want it released here. This information is needed for every run of a new document, correct? Also, while it is certainly a lot of data, overall its memory usage may not be that big a deal. Glen Perhaps it would be better to store the information in the Driver or FOUserAgent, IOW to put the caching under user control. If caching at the classloader level is the best way to do it, then it makes perfect sense to do what you suggest. regards, finn __ Do you Yahoo!? Yahoo! SiteBuilder - Free web site building tool. Try it! http://webhosting.yahoo.com/ps/sb/
Re: Comments on new property maker implementation
Finn Bock wrote: I would guess that doing ~6 string compares to navigate the binary tree (with 148 color keywords) is slower than one string hash, ~1.2 int compares and one string compare. But I haven't measured it, so you might be well be right. Many keyword sets for other properties are much smaller and could perhaps benefit from a more suitable collection type. [J.Pietschmann] I meant setup effort, although a binary tree will most likely do additional memory management. You are right about the lookup. Just for curiosity, where do you get the 1.2 int comparisions? A perfect hash should not have collisions. I was comparing a standard HashMap with your binary tree. A perfect hash would likely have a more complicated hash function and of course zero int compares. It might also be interesting how a trie or ternary tree (as used for hyphenation patterns) would compare to hash maps for keywords (in terms of setup costs, lookup costs and memory). I have doing a study of various Java implementations on my todo list but didn't quite get around to do this. Very interesting indeed. regards, finn
Re: Comments on new property maker implementation
[Finn Bock] You should perhaps also be aware that the values in a static array gets assigned to the array one element at a time. So static int[] a = { 101,102,103,104,105,106,107,108 }; becomes in bytecodes: Method static {} 0 bipush 8 2 newarray int 4 dup 5 iconst_0 [Glen Mazza] Hmmm...Are you saying that declaring a static array isn't much (any?) faster than manually creating one? I would guess that it is a bit faster than the typical bytecode for manually created arrays since the above uses 'dup' instead of 'getstatic' or 'aload' to push the array on the stack. I didn't realize that there is code being run for static arrays--I would have thought the compiled bytecode just includes the array internally, and not the code to create it. (i.e., if you opened the bytecode you would see an array 101 102 103 104... sitting someplace.) Arrays can't be stored in the constant pool so there is no place to put the data except as bytecode. http://java.sun.com/docs/books/vmspec/html/ClassFile.doc.html#20080 It should perhaps be noted that constant instances of String is not really stored in the constant pool either. The pool just stores the utf-8 representation of the string constant, and each literal string is initialized as a new pool String object based on that (as UTF-16ish). Isn't that how C works, at least? I think so, but C has hardware support for code and data segments so C can make better use of it. Sigh...I guess I *didn't* know bytecode by heart after all! ;-) I didn't bring it up to discourage the use of static initialized arrays. If it makes sense to put something in a static array we should do so without concern of compiletime vs. runtime. After all, the initialization is only performed once per classloader. regards, finn
Re: Comments on new property maker implementation
You should perhaps also be aware that the values in a static array gets assigned to the array one element at a time. So [J.Pietschmann] That's an unpleasant surprise. I was always under the impression statically initialized data was stored along with the string constants, like in C. This means a generated perfect has table wouldn't have much of an advantage over, let's say, a simple binary tree loaded with the values in proper order so that the tree becomes automatically balanced (without rotations like rb-trees do). I would guess that doing ~6 string compares to navigate the binary tree (with 148 color keywords) is slower than one string hash, ~1.2 int compares and one string compare. But I haven't measured it, so you might be well be right. Many keyword sets for other properties are much smaller and could perhaps benefit from a more suitable collection type. It would make sense, however, to properly initialitze initial size values for the various hashmaps currently used. Indeed. Rehashing a HashMap is very fast tho, so I wouldn't expect a major speedup, but it all adds up. regards, finn
RE: Comments on new property maker implementation
-Original Message- From: Finn Bock [mailto:[EMAIL PROTECTED] [ Glen : ] Sigh...I guess I *didn't* know bytecode by heart after all! ;-) I didn't bring it up to discourage the use of static initialized arrays. If it makes sense to put something in a static array we should do so without concern of compiletime vs. runtime. After all, the initialization is only performed once per classloader. Well, (sorry to disappoint you, Peter) I don't know my BC by heart, but IIRC the real difference would be in the size of the compiled classes... See also a little trick I stumbled upon for for-loops. It's common(?) knowledge that testing for a value to be greater than (or equal to) another value, is the same as testing whether the result of their subtraction is greater than (or equal to) zero --rewriting this effectively saves a processor instruction per comparison. If you subsequently combine the test and the decrementing of the counter, you can slightly reduce the size of the compiled class further. In short: for( int i = 0; i j ; ++i ) is better written as ( = faster; that is: it will save a few hundred millisecs in large loops, the few that might just be enough to give the average user the impression that the software is actually any faster than before ) for( int i = j; i 0; --i ) and even better ( leads to even more compact compiled classes; performance-boost is negligeable with current hardware, but might turn out to be an advantage --albeit a minor one - when this class has to be loaded from a network location ) for( int i = j; --i = 0; ) Cheers, Andreas
Re: Comments on new property maker implementation
Finn Bock wrote: I would guess that doing ~6 string compares to navigate the binary tree (with 148 color keywords) is slower than one string hash, ~1.2 int compares and one string compare. But I haven't measured it, so you might be well be right. Many keyword sets for other properties are much smaller and could perhaps benefit from a more suitable collection type. I meant setup effort, although a binary tree will most likely do additional memory management. You are right about the lookup. Just for curiosity, where do you get the 1.2 int comparisions? A perfect hash should not have collisions. It might also be interesting how a trie or ternary tree (as used for hyphenation patterns) would compare to hash maps for keywords (in terms of setup costs, lookup costs and memory). I have doing a study of various Java implementations on my todo list but didn't quite get around to do this. J.Pietschmann
Re: Comments on new property maker implementation
[Glen Mazza] I've looked at your changes--I like them, and I'm thankful to have someone on our team to be able to redesign the properties as you have. Getting rid of the 250 autogenerated or so classes will be a welcome improvement. But the biggest improvement is IMHO the easy ability to create special maker subclasses to handle the corner cases. Take a look at IndentPropertyMaker for the calculation of start and end-indent and at BorderWidthPropertyMaker for the special handling of border-width when border-style is NONE. Comments right now: 1.) Unlike what I was saying earlier, I don't think we should move from Property.Maker to a new PropertyMaker class after all, your design looks fine. I've noticed most subclasses of Property.Maker are within subclasses of Properties themselves (e.g., LengthProperty, LengthProperty.Maker, etc.) so it looks like a neat, clean design. 2.) The new FOPropertyMapping.java class appears (1) autogenerated, and (2) to be an XSLT masterpiece at that as well. If it is indeed in good shape, I'd like you to submit it to Bugzilla as the new fo-property-mapping.xsl, replacing the old one of that name in src/codegen. Initially my new FOPropertyMapping was generated by XSLT but that is now a long time ago and I have made lots of manual changes since then. The XSLT script only handled the most common property information and was just a hack to get me started. The output isn't a complete java file, it doesn't link the subproperties to the base properties and it doesn't deal with the classname of any of the complex properties. (We won't apply it however, until we no longer need the current autogenerated fo-property-mapping.xsl, i.e., until all the old properties have been tossed out.) This way if we have to make wide-ranging changes to FOPropertyMapping, we'll have a XSLT source file we can conveniently work with. (Note that putting it in codegen does *not* mean that it will be automatically autogenerated anymore--it won't, just as constants.xsl no longer is--we'll pull it out of the main Ant build target at that time and keep it the separate, manual xsltToJava target in our build file[1]. [1] http://cvs.apache.org/viewcvs.cgi/xml-fop/build.xml?rev=1.97view=auto ) Comments on FOPropertyMapping: I like removing all these autogenerated classes, but I think we can still keep some processing at compile-time for more of a performance gain, as follows: 3) I think the runtime construction of the generic properties (genericColor, genericCondBorderWidth, etc.) may not be necessary. We can still have those xslt-generated into classes (6-8 classes total), but this time we check them into FOP (again, keeping the xsl available for manual re-generation when needed). But most of the generic classes are so small (your initialization of GenericCondPadding is only 4 lines of code), that going back to creating concrete classes would be noticeably beneficial either, so I'm not recommending this change. The generic properties are just templates that carries default data values to be used later on, so I don't fully see how we could xslt- generate them as anything other than containers of default values. Your last statement is a bit difficult to parse so I'm not sure what exactly you are recommending. One thing that *does* stick out, however, is the 100 or so addKeyword() calls for genericColor (the largest of the generic properties): genericColor.addKeyword(antiquewhite, #faebd7); genericColor.addKeyword(aqua, #00); genericColor.addKeyword(aquamarine, #7fffd4); . I'd like us to have a static array of these values--i.e., something done compile-time, that genericColor can just reference, so we don't have to do this keyword initialization. I probably need an example of what you thinking are here. Right now in HEAD all the color keywords are stored in a HashMap created in GenericColor so the keywords initialization is already done. Putting the keywords in static array would require us to somehow search the array and I don't see how that will be much faster. You should perhaps also be aware that the values in a static array gets assigned to the array one element at a time. So static int[] a = { 101,102,103,104,105,106,107,108 }; becomes in bytecodes: Method static {} 0 bipush 8 2 newarray int 4 dup 5 iconst_0 6 bipush 101 8 iastore 9 dup 10 iconst_1 11 bipush 102 13 iastore 14 dup 15 iconst_2 16 bipush 103 18 iastore ... and so on for each index. (In case you don't know bytecode by heart, iconst and bipush both push a constant on the stack and iastore pops 3 items from the stack; an index, a value and an array and assign the value to the index in the array). 4) I'd also like us to, rather than call setInherited() and setDefault() for each of the properties during initialization, for the Property/Property.Maker classes to just reference that information from two (new) static arrays, added to Constants.java. We
Re: Comments on new property maker implementation
Glen Mazza wrote: One thing that *does* stick out, however, is the 100 or so addKeyword() calls for genericColor ... I'd like us to have a static array of these values--i.e., something done compile-time, that genericColor can just reference, so we don't have to do this keyword initialization. Look up perfect hash code and the associated generators on the internet, like gperf, a C++ implementation used by gcc and a veriety of other compilers to provide a data structure for mapping strings to something else in an efficient way. Mind you, this would also benefit mapping to FO and property names to their associated classes or code numbers. J.Pietschmann
Re: Comments on new property maker implementation
Finn Bock wrote: You should perhaps also be aware that the values in a static array gets assigned to the array one element at a time. So That's an unpleasant surprise. I was always under the impression statically initialized data was stored along with the string constants, like in C. This means a generated perfect has table wouldn't have much of an advantage over, let's say, a simple binary tree loaded with the values in proper order so that the tree becomes automatically balanced (without rotations like rb-trees do). It would make sense, however, to properly initialitze initial size values for the various hashmaps currently used. J.Pietschmann
Re: Comments on new property maker implementation
--- Finn Bock [EMAIL PROTECTED] wrote: But the biggest improvement is IMHO the easy ability to create special maker subclasses to handle the corner cases. Take a look at IndentPropertyMaker for the calculation of start and end-indent and at BorderWidthPropertyMaker for the special handling of border-width when border-style is NONE. Well, I'm not there yet, but I'll be able to appreciate it in due time. Initially my new FOPropertyMapping was generated by XSLT but that is now a long time ago and I have made lots of manual changes since then. OK, no problem, we'll modify the Java source from now on. The generic properties are just templates that carries default data values to be used later on, so I don't fully see how we could xslt- generate them as anything other than containers of default values. Your last statement is a bit difficult to parse so I'm not sure what exactly you are recommending. Umm, never mind. What I was trying to say is that the generic templates (GenericKeep, GenericSpace, etc., of the present code) were all autogenerated. *If* you thought it still useful to keep it as such, it's OK with me (i.e., going down from 250 autogenerated to about 8 is still a very nice improvement.) But you no longer see a need for it, which is absolutely fine with me. One thing that *does* stick out, however, is the 100 or so addKeyword() calls for genericColor (the largest of the generic properties): genericColor.addKeyword(antiquewhite, #faebd7); genericColor.addKeyword(aqua, #00); genericColor.addKeyword(aquamarine, #7fffd4); . I'd like us to have a static array of these values--i.e., something done compile-time, that genericColor can just reference, so we don't have to do this keyword initialization. I probably need an example of what you thinking are here. Right now in HEAD all the color keywords are stored in a HashMap created in GenericColor so the keywords initialization is already done. OK--I see, thanks for the enlightenment here. Never mind again, I was wrong on this point. Putting the keywords in static array would require us to somehow search the array and I don't see how that will be much faster. Yes, wasn't thinking of that. 4) I'd also like us to, rather than call setInherited() and setDefault() for each of the properties during initialization, for the Property/Property.Maker classes to just reference that information from two (new) static arrays, added to Constants.java. We can also get rid of these two setter methods as well (ideally there shouldn't be setters for these attributes anyway--they should remain inherent to the Property.) This change will allow us to take advantage of the fact that we are now on int-constants. getDefault(PR_WHATEVER), for example, is just Constants.DefaultArray[PR_WHATEVER]. I think 'Default' is a bad example, noone ever tries to get the default value except for the property maker itself, but your argument holds for isInherited(). No--I don't think you've gotten my point here. I don't care about the consumers of that information--even if it is just Property.Maker. But I don't see the reason to run-time initialize a PropertyMaker with inherited and default values, because I can add the whole array in the Constants interface, or even in Property.Maker directly. static Boolean inheritedArray[] = { false // 0 true// PR_PROP_1 true// PR_PROP_2 false // PR_PROP_3 true// ... Once you initialize a Property.Maker with its PR_XX constant, *it* (the Maker) can always obtain these values by accessing inheritedArray[PR_XX] or defaultArray[PR_XX]. No reason to initialize via setInherited(true) or setDefault(5). Do you see what I'm trying to say? Still, I disagree. If one want to know is a property is inherited, the proper way to get the information should be to call propertyMapping[PR_WHATEVER].isInherited(). OK--we can place these two arrays in a location where only the Property.Makers can get to it. (Maybe a protected static array in Property.Maker?) Thoughts here? Again, getting rid of the useGeneric() function. This is for more speed, encapsulation, and also shrinking FOPropertyMapping class a bit. A very good idea. +1. I can probably make the modifications to this--looks simple. Thanks, Glen __ Do you Yahoo!? Yahoo! Hotjobs: Enter the Signing Bonus Sweepstakes http://hotjobs.sweepstakes.yahoo.com/signingbonus
Re: Comments on new property maker implementation
Finn Bock wrote: I probably need an example of what you thinking are here. Right now in HEAD all the color keywords are stored in a HashMap created in GenericColor so the keywords initialization is already done. Putting the keywords in static array would require us to somehow search the array and I don't see how that will be much faster. You should perhaps also be aware that the values in a static array gets assigned to the array one element at a time. So static int[] a = { 101,102,103,104,105,106,107,108 }; becomes in bytecodes: Method static {} 0 bipush 8 2 newarray int 4 dup 5 iconst_0 6 bipush 101 8 iastore 9 dup 10 iconst_1 11 bipush 102 13 iastore 14 dup 15 iconst_2 16 bipush 103 18 iastore ... and so on for each index. (In case you don't know bytecode by heart, iconst and bipush both push a constant on the stack and iastore pops 3 items from the stack; an index, a value and an array and assign the value to the index in the array). Finn, I can't imagine there is anyone here who doesn't know bytecode by heart. (Except maybe me.) Peter -- Peter B. West http://www.powerup.com.au/~pbwest/resume.html
Re: Comments on new property maker implementation
--- Peter B. West [EMAIL PROTECTED] wrote: [Finn Bock] You should perhaps also be aware that the values in a static array gets assigned to the array one element at a time. So static int[] a = { 101,102,103,104,105,106,107,108 }; becomes in bytecodes: Method static {} 0 bipush 8 2 newarray int 4 dup 5 iconst_0 Hmmm...Are you saying that declaring a static array isn't much (any?) faster than manually creating one? I didn't realize that there is code being run for static arrays--I would have thought the compiled bytecode just includes the array internally, and not the code to create it. (i.e., if you opened the bytecode you would see an array 101 102 103 104... sitting someplace.) Isn't that how C works, at least? Sigh...I guess I *didn't* know bytecode by heart after all! ;-) Glen __ Do you Yahoo!? Yahoo! Hotjobs: Enter the Signing Bonus Sweepstakes http://hotjobs.sweepstakes.yahoo.com/signingbonus
Re: Comments
J.Pietschmann wrote: Peter B. West wrote: I recall, however, that it took me a year to gain that status, a year during which I wrote a considerable amount of code which I maintained in my ISP account. My crime was that I did not toe the Party line. I hope those days are gone, and that, should a developer happen along who contributes to alt.design, and expresses a desire to continue to work on it, he or she will be granted committer status with the now customary alacrity. Sorry, I don't see much value in using pull parsing instead of push parsing either. Now if you could get footnote separators to appear in HEAD, or TR14 line breaking, or side floats... So we shall disagree. That is not my point. Is disagreement allowed? Peter -- Peter B. West http://www.powerup.com.au/~pbwest/resume.html - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, email: [EMAIL PROTECTED]