Re: Comments on new property maker implementation
Finn Bock wrote: I would guess that doing ~6 string compares to navigate the binary tree (with 148 color keywords) is slower than one string hash, ~1.2 int compares and one string compare. But I haven't measured it, so you might be well be right. Many keyword sets for other properties are much smaller and could perhaps benefit from a more suitable collection type. [J.Pietschmann] I meant setup effort, although a binary tree will most likely do additional memory management. You are right about the lookup. Just for curiosity, where do you get the 1.2 int comparisions? A perfect hash should not have collisions. I was comparing a standard HashMap with your binary tree. A perfect hash would likely have a more complicated hash function and of course zero int compares. It might also be interesting how a trie or ternary tree (as used for hyphenation patterns) would compare to hash maps for keywords (in terms of setup costs, lookup costs and memory). I have doing a study of various Java implementations on my todo list but didn't quite get around to do this. Very interesting indeed. regards, finn
Re: Comments on new property maker implementation
[Finn Bock] You should perhaps also be aware that the values in a static array gets assigned to the array one element at a time. So static int[] a = { 101,102,103,104,105,106,107,108 }; becomes in bytecodes: Method static {} 0 bipush 8 2 newarray int 4 dup 5 iconst_0 [Glen Mazza] Hmmm...Are you saying that declaring a static array isn't much (any?) faster than manually creating one? I would guess that it is a bit faster than the typical bytecode for manually created arrays since the above uses 'dup' instead of 'getstatic' or 'aload' to push the array on the stack. I didn't realize that there is code being run for static arrays--I would have thought the compiled bytecode just includes the array internally, and not the code to create it. (i.e., if you opened the bytecode you would see an array 101 102 103 104... sitting someplace.) Arrays can't be stored in the constant pool so there is no place to put the data except as bytecode. http://java.sun.com/docs/books/vmspec/html/ClassFile.doc.html#20080 It should perhaps be noted that constant instances of String is not really stored in the constant pool either. The pool just stores the utf-8 representation of the string constant, and each literal string is initialized as a new pool String object based on that (as UTF-16ish). Isn't that how C works, at least? I think so, but C has hardware support for code and data segments so C can make better use of it. Sigh...I guess I *didn't* know bytecode by heart after all! ;-) I didn't bring it up to discourage the use of static initialized arrays. If it makes sense to put something in a static array we should do so without concern of compiletime vs. runtime. After all, the initialization is only performed once per classloader. regards, finn
Re: Comments on new property maker implementation
You should perhaps also be aware that the values in a static array gets assigned to the array one element at a time. So [J.Pietschmann] That's an unpleasant surprise. I was always under the impression statically initialized data was stored along with the string constants, like in C. This means a generated perfect has table wouldn't have much of an advantage over, let's say, a simple binary tree loaded with the values in proper order so that the tree becomes automatically balanced (without rotations like rb-trees do). I would guess that doing ~6 string compares to navigate the binary tree (with 148 color keywords) is slower than one string hash, ~1.2 int compares and one string compare. But I haven't measured it, so you might be well be right. Many keyword sets for other properties are much smaller and could perhaps benefit from a more suitable collection type. It would make sense, however, to properly initialitze initial size values for the various hashmaps currently used. Indeed. Rehashing a HashMap is very fast tho, so I wouldn't expect a major speedup, but it all adds up. regards, finn
RE: Comments on new property maker implementation
-Original Message- From: Finn Bock [mailto:[EMAIL PROTECTED] [ Glen : ] Sigh...I guess I *didn't* know bytecode by heart after all! ;-) I didn't bring it up to discourage the use of static initialized arrays. If it makes sense to put something in a static array we should do so without concern of compiletime vs. runtime. After all, the initialization is only performed once per classloader. Well, (sorry to disappoint you, Peter) I don't know my BC by heart, but IIRC the real difference would be in the size of the compiled classes... See also a little trick I stumbled upon for for-loops. It's common(?) knowledge that testing for a value to be greater than (or equal to) another value, is the same as testing whether the result of their subtraction is greater than (or equal to) zero --rewriting this effectively saves a processor instruction per comparison. If you subsequently combine the test and the decrementing of the counter, you can slightly reduce the size of the compiled class further. In short: for( int i = 0; i j ; ++i ) is better written as ( = faster; that is: it will save a few hundred millisecs in large loops, the few that might just be enough to give the average user the impression that the software is actually any faster than before ) for( int i = j; i 0; --i ) and even better ( leads to even more compact compiled classes; performance-boost is negligeable with current hardware, but might turn out to be an advantage --albeit a minor one - when this class has to be loaded from a network location ) for( int i = j; --i = 0; ) Cheers, Andreas
Re: Comments on new property maker implementation
Finn Bock wrote: I would guess that doing ~6 string compares to navigate the binary tree (with 148 color keywords) is slower than one string hash, ~1.2 int compares and one string compare. But I haven't measured it, so you might be well be right. Many keyword sets for other properties are much smaller and could perhaps benefit from a more suitable collection type. I meant setup effort, although a binary tree will most likely do additional memory management. You are right about the lookup. Just for curiosity, where do you get the 1.2 int comparisions? A perfect hash should not have collisions. It might also be interesting how a trie or ternary tree (as used for hyphenation patterns) would compare to hash maps for keywords (in terms of setup costs, lookup costs and memory). I have doing a study of various Java implementations on my todo list but didn't quite get around to do this. J.Pietschmann
Comments on new property maker implementation
Finn, I've looked at your changes--I like them, and I'm thankful to have someone on our team to be able to redesign the properties as you have. Getting rid of the 250 autogenerated or so classes will be a welcome improvement. Comments right now: 1.) Unlike what I was saying earlier, I don't think we should move from Property.Maker to a new PropertyMaker class after all, your design looks fine. I've noticed most subclasses of Property.Maker are within subclasses of Properties themselves (e.g., LengthProperty, LengthProperty.Maker, etc.) so it looks like a neat, clean design. 2.) The new FOPropertyMapping.java class appears (1) autogenerated, and (2) to be an XSLT masterpiece at that as well. If it is indeed in good shape, I'd like you to submit it to Bugzilla as the new fo-property-mapping.xsl, replacing the old one of that name in src/codegen. (We won't apply it however, until we no longer need the current autogenerated fo-property-mapping.xsl, i.e., until all the old properties have been tossed out.) This way if we have to make wide-ranging changes to FOPropertyMapping, we'll have a XSLT source file we can conveniently work with. (Note that putting it in codegen does *not* mean that it will be automatically autogenerated anymore--it won't, just as constants.xsl no longer is--we'll pull it out of the main Ant build target at that time and keep it the separate, manual xsltToJava target in our build file[1]. [1] http://cvs.apache.org/viewcvs.cgi/xml-fop/build.xml?rev=1.97view=auto ) Comments on FOPropertyMapping: I like removing all these autogenerated classes, but I think we can still keep some processing at compile-time for more of a performance gain, as follows: 3) I think the runtime construction of the generic properties (genericColor, genericCondBorderWidth, etc.) may not be necessary. We can still have those xslt-generated into classes (6-8 classes total), but this time we check them into FOP (again, keeping the xsl available for manual re-generation when needed). But most of the generic classes are so small (your initialization of GenericCondPadding is only 4 lines of code), that going back to creating concrete classes would be noticeably beneficial either, so I'm not recommending this change. One thing that *does* stick out, however, is the 100 or so addKeyword() calls for genericColor (the largest of the generic properties): genericColor.addKeyword(antiquewhite, #faebd7); genericColor.addKeyword(aqua, #00); genericColor.addKeyword(aquamarine, #7fffd4); . I'd like us to have a static array of these values--i.e., something done compile-time, that genericColor can just reference, so we don't have to do this keyword initialization. 4) I'd also like us to, rather than call setInherited() and setDefault() for each of the properties during initialization, for the Property/Property.Maker classes to just reference that information from two (new) static arrays, added to Constants.java. We can also get rid of these two setter methods as well (ideally there shouldn't be setters for these attributes anyway--they should remain inherent to the Property.) This change will allow us to take advantage of the fact that we are now on int-constants. getDefault(PR_WHATEVER), for example, is just Constants.DefaultArray[PR_WHATEVER]. 5) Similar to (b) above, several of the makers also have a useGeneric() initialization requirement: m = new CondLengthProperty.Maker(PR_PADDING_END); m.useGeneric(genericCondPadding); For those Makers that require it, I'd like the constructor to be expanded to this: m = new CondLengthProperty.Maker(PR_PADDING_END, genericCondPadding); Again, getting rid of the useGeneric() function. This is for more speed, encapsulation, and also shrinking FOPropertyMapping class a bit. Sorry for the long post. I'll probably have other comments in other areas, but this is all I've studied for now. Thoughts? Thanks, Glen __ Do you Yahoo!? Yahoo! Hotjobs: Enter the Signing Bonus Sweepstakes http://hotjobs.sweepstakes.yahoo.com/signingbonus
Re: Comments on new property maker implementation
[Glen Mazza] I've looked at your changes--I like them, and I'm thankful to have someone on our team to be able to redesign the properties as you have. Getting rid of the 250 autogenerated or so classes will be a welcome improvement. But the biggest improvement is IMHO the easy ability to create special maker subclasses to handle the corner cases. Take a look at IndentPropertyMaker for the calculation of start and end-indent and at BorderWidthPropertyMaker for the special handling of border-width when border-style is NONE. Comments right now: 1.) Unlike what I was saying earlier, I don't think we should move from Property.Maker to a new PropertyMaker class after all, your design looks fine. I've noticed most subclasses of Property.Maker are within subclasses of Properties themselves (e.g., LengthProperty, LengthProperty.Maker, etc.) so it looks like a neat, clean design. 2.) The new FOPropertyMapping.java class appears (1) autogenerated, and (2) to be an XSLT masterpiece at that as well. If it is indeed in good shape, I'd like you to submit it to Bugzilla as the new fo-property-mapping.xsl, replacing the old one of that name in src/codegen. Initially my new FOPropertyMapping was generated by XSLT but that is now a long time ago and I have made lots of manual changes since then. The XSLT script only handled the most common property information and was just a hack to get me started. The output isn't a complete java file, it doesn't link the subproperties to the base properties and it doesn't deal with the classname of any of the complex properties. (We won't apply it however, until we no longer need the current autogenerated fo-property-mapping.xsl, i.e., until all the old properties have been tossed out.) This way if we have to make wide-ranging changes to FOPropertyMapping, we'll have a XSLT source file we can conveniently work with. (Note that putting it in codegen does *not* mean that it will be automatically autogenerated anymore--it won't, just as constants.xsl no longer is--we'll pull it out of the main Ant build target at that time and keep it the separate, manual xsltToJava target in our build file[1]. [1] http://cvs.apache.org/viewcvs.cgi/xml-fop/build.xml?rev=1.97view=auto ) Comments on FOPropertyMapping: I like removing all these autogenerated classes, but I think we can still keep some processing at compile-time for more of a performance gain, as follows: 3) I think the runtime construction of the generic properties (genericColor, genericCondBorderWidth, etc.) may not be necessary. We can still have those xslt-generated into classes (6-8 classes total), but this time we check them into FOP (again, keeping the xsl available for manual re-generation when needed). But most of the generic classes are so small (your initialization of GenericCondPadding is only 4 lines of code), that going back to creating concrete classes would be noticeably beneficial either, so I'm not recommending this change. The generic properties are just templates that carries default data values to be used later on, so I don't fully see how we could xslt- generate them as anything other than containers of default values. Your last statement is a bit difficult to parse so I'm not sure what exactly you are recommending. One thing that *does* stick out, however, is the 100 or so addKeyword() calls for genericColor (the largest of the generic properties): genericColor.addKeyword(antiquewhite, #faebd7); genericColor.addKeyword(aqua, #00); genericColor.addKeyword(aquamarine, #7fffd4); . I'd like us to have a static array of these values--i.e., something done compile-time, that genericColor can just reference, so we don't have to do this keyword initialization. I probably need an example of what you thinking are here. Right now in HEAD all the color keywords are stored in a HashMap created in GenericColor so the keywords initialization is already done. Putting the keywords in static array would require us to somehow search the array and I don't see how that will be much faster. You should perhaps also be aware that the values in a static array gets assigned to the array one element at a time. So static int[] a = { 101,102,103,104,105,106,107,108 }; becomes in bytecodes: Method static {} 0 bipush 8 2 newarray int 4 dup 5 iconst_0 6 bipush 101 8 iastore 9 dup 10 iconst_1 11 bipush 102 13 iastore 14 dup 15 iconst_2 16 bipush 103 18 iastore ... and so on for each index. (In case you don't know bytecode by heart, iconst and bipush both push a constant on the stack and iastore pops 3 items from the stack; an index, a value and an array and assign the value to the index in the array). 4) I'd also like us to, rather than call setInherited() and setDefault() for each of the properties during initialization, for the Property/Property.Maker classes to just reference that information from two (new) static arrays, added to Constants.java. We
Re: Comments on new property maker implementation
Glen Mazza wrote: One thing that *does* stick out, however, is the 100 or so addKeyword() calls for genericColor ... I'd like us to have a static array of these values--i.e., something done compile-time, that genericColor can just reference, so we don't have to do this keyword initialization. Look up perfect hash code and the associated generators on the internet, like gperf, a C++ implementation used by gcc and a veriety of other compilers to provide a data structure for mapping strings to something else in an efficient way. Mind you, this would also benefit mapping to FO and property names to their associated classes or code numbers. J.Pietschmann
Re: Comments on new property maker implementation
Finn Bock wrote: You should perhaps also be aware that the values in a static array gets assigned to the array one element at a time. So That's an unpleasant surprise. I was always under the impression statically initialized data was stored along with the string constants, like in C. This means a generated perfect has table wouldn't have much of an advantage over, let's say, a simple binary tree loaded with the values in proper order so that the tree becomes automatically balanced (without rotations like rb-trees do). It would make sense, however, to properly initialitze initial size values for the various hashmaps currently used. J.Pietschmann
Re: Comments on new property maker implementation
--- Finn Bock [EMAIL PROTECTED] wrote: But the biggest improvement is IMHO the easy ability to create special maker subclasses to handle the corner cases. Take a look at IndentPropertyMaker for the calculation of start and end-indent and at BorderWidthPropertyMaker for the special handling of border-width when border-style is NONE. Well, I'm not there yet, but I'll be able to appreciate it in due time. Initially my new FOPropertyMapping was generated by XSLT but that is now a long time ago and I have made lots of manual changes since then. OK, no problem, we'll modify the Java source from now on. The generic properties are just templates that carries default data values to be used later on, so I don't fully see how we could xslt- generate them as anything other than containers of default values. Your last statement is a bit difficult to parse so I'm not sure what exactly you are recommending. Umm, never mind. What I was trying to say is that the generic templates (GenericKeep, GenericSpace, etc., of the present code) were all autogenerated. *If* you thought it still useful to keep it as such, it's OK with me (i.e., going down from 250 autogenerated to about 8 is still a very nice improvement.) But you no longer see a need for it, which is absolutely fine with me. One thing that *does* stick out, however, is the 100 or so addKeyword() calls for genericColor (the largest of the generic properties): genericColor.addKeyword(antiquewhite, #faebd7); genericColor.addKeyword(aqua, #00); genericColor.addKeyword(aquamarine, #7fffd4); . I'd like us to have a static array of these values--i.e., something done compile-time, that genericColor can just reference, so we don't have to do this keyword initialization. I probably need an example of what you thinking are here. Right now in HEAD all the color keywords are stored in a HashMap created in GenericColor so the keywords initialization is already done. OK--I see, thanks for the enlightenment here. Never mind again, I was wrong on this point. Putting the keywords in static array would require us to somehow search the array and I don't see how that will be much faster. Yes, wasn't thinking of that. 4) I'd also like us to, rather than call setInherited() and setDefault() for each of the properties during initialization, for the Property/Property.Maker classes to just reference that information from two (new) static arrays, added to Constants.java. We can also get rid of these two setter methods as well (ideally there shouldn't be setters for these attributes anyway--they should remain inherent to the Property.) This change will allow us to take advantage of the fact that we are now on int-constants. getDefault(PR_WHATEVER), for example, is just Constants.DefaultArray[PR_WHATEVER]. I think 'Default' is a bad example, noone ever tries to get the default value except for the property maker itself, but your argument holds for isInherited(). No--I don't think you've gotten my point here. I don't care about the consumers of that information--even if it is just Property.Maker. But I don't see the reason to run-time initialize a PropertyMaker with inherited and default values, because I can add the whole array in the Constants interface, or even in Property.Maker directly. static Boolean inheritedArray[] = { false // 0 true// PR_PROP_1 true// PR_PROP_2 false // PR_PROP_3 true// ... Once you initialize a Property.Maker with its PR_XX constant, *it* (the Maker) can always obtain these values by accessing inheritedArray[PR_XX] or defaultArray[PR_XX]. No reason to initialize via setInherited(true) or setDefault(5). Do you see what I'm trying to say? Still, I disagree. If one want to know is a property is inherited, the proper way to get the information should be to call propertyMapping[PR_WHATEVER].isInherited(). OK--we can place these two arrays in a location where only the Property.Makers can get to it. (Maybe a protected static array in Property.Maker?) Thoughts here? Again, getting rid of the useGeneric() function. This is for more speed, encapsulation, and also shrinking FOPropertyMapping class a bit. A very good idea. +1. I can probably make the modifications to this--looks simple. Thanks, Glen __ Do you Yahoo!? Yahoo! Hotjobs: Enter the Signing Bonus Sweepstakes http://hotjobs.sweepstakes.yahoo.com/signingbonus
Re: Comments on new property maker implementation
Finn Bock wrote: I probably need an example of what you thinking are here. Right now in HEAD all the color keywords are stored in a HashMap created in GenericColor so the keywords initialization is already done. Putting the keywords in static array would require us to somehow search the array and I don't see how that will be much faster. You should perhaps also be aware that the values in a static array gets assigned to the array one element at a time. So static int[] a = { 101,102,103,104,105,106,107,108 }; becomes in bytecodes: Method static {} 0 bipush 8 2 newarray int 4 dup 5 iconst_0 6 bipush 101 8 iastore 9 dup 10 iconst_1 11 bipush 102 13 iastore 14 dup 15 iconst_2 16 bipush 103 18 iastore ... and so on for each index. (In case you don't know bytecode by heart, iconst and bipush both push a constant on the stack and iastore pops 3 items from the stack; an index, a value and an array and assign the value to the index in the array). Finn, I can't imagine there is anyone here who doesn't know bytecode by heart. (Except maybe me.) Peter -- Peter B. West http://www.powerup.com.au/~pbwest/resume.html
Re: Comments on new property maker implementation
--- Peter B. West [EMAIL PROTECTED] wrote: [Finn Bock] You should perhaps also be aware that the values in a static array gets assigned to the array one element at a time. So static int[] a = { 101,102,103,104,105,106,107,108 }; becomes in bytecodes: Method static {} 0 bipush 8 2 newarray int 4 dup 5 iconst_0 Hmmm...Are you saying that declaring a static array isn't much (any?) faster than manually creating one? I didn't realize that there is code being run for static arrays--I would have thought the compiled bytecode just includes the array internally, and not the code to create it. (i.e., if you opened the bytecode you would see an array 101 102 103 104... sitting someplace.) Isn't that how C works, at least? Sigh...I guess I *didn't* know bytecode by heart after all! ;-) Glen __ Do you Yahoo!? Yahoo! Hotjobs: Enter the Signing Bonus Sweepstakes http://hotjobs.sweepstakes.yahoo.com/signingbonus