Re: Properties Implementation and Canonical Mappings
On Mon, 2003-12-01 at 02:45, Glen Mazza wrote: --- John Austin [EMAIL PROTECTED] wrote: The property strings are given to the Property object constructor by some path beginning with a SAX parser. It is reasonable to assume that the SAX parser loses refs to most of these strings and that the Property implementation retains the only references to these String objects. How big are String Objects ? At least 16 bytes plus storage for characters. What does this save us ? Probably only about 1,600,000 bytes for this file. CPU cost of creating strings is probably similar to cost of checking string table for a copy. Just to clarify, the (additional?) CPU cost you mentioning above is *not* occurring for the present process, correct? I think you're referring to the cost that would be added as a result of the changes you're recommending (because there now will be a string table search to avoid duplication). Going back to the beginning of my involvement, I found this issue because Property searches are the high-runner for CPU in FOP. I don't want to split hairs in isolation over which search/constructor sequence is faster. I want to remove the conditions that cause the current pathology. Hash table lookups are FAST. When we invest in object creation we recover many times over in the end. Also, the string table you mention--I think you're speaking generically, but is there a specific, already available construct in Java that we can use for this purpose in FOP? I'd like to find out what you have in mind for a specific implementation. HashMap works fine the way Peter has it set up in alt-design. I use the same construct in the Perl code I use to analyze the large sample FO files. -- John Austin [EMAIL PROTECTED]
Re: Properties Implementation and Canonical Mappings
Input: The XSL-FO file produced from: DocBook: The Definitive Guide Document size: 648 Pages // for the O'Reilly edition FO file size: 21,659,370 bytes Properties: 526,648 Tags: 285,223 Height of tree: 17 // max height of the parse tree Unique prop names: 117 // bounded by the spec Unique prop values: 13,520 // bounded by the real world Using these numbers, we can explore the sort of benefits to expect from revised Property implementation. With over a million strings, the FOTree for this document would use forty or fifty Mb in addition to data structures. This document can be used as an example even though it probably can't be formatted (yet) by FOP. It has a lot of tables. It could be a goal of the FOP project to generate this well-known document. I was thinking of using the XSL-FO spec from the W3C web site but couldn't find the stylesheet to make the FO file. If anyone knows where to find them, please let me know. Statistics from this file: Number of Elements by tree level: level=1 count=1 level=2 count=473 level=3 count=5242 level=4 count=5480 level=5 count=7129 level=6 count=26231 level=7 count=22475 level=8 count=36447 level=9 count=62288 level=10 count=38536 level=11 count=30486 level=12 count=23641 level=13 count=23190 level=14 count=2023 level=15 count=771 level=16 count=701 level=17 count=109 Element frequencies: a 24 I wonder where this came from fo:basic-link 5225 fo:block 112142 fo:conditional-page-master-reference 48 fo:external-graphic 1097 fo:flow 472 fo:footnote 22 fo:footnote-body 22 fo:inline 62792 fo:layout-master-set 1 fo:leader 1764 fo:list-block 279 fo:list-item 1004 fo:list-item-body 1004 fo:list-item-label 1004 fo:marker 5335 fo:page-number 1872 fo:page-number-citation 3224 fo:page-sequence 472 fo:page-sequence-master 12 fo:region-after 38 fo:region-before 38 fo:region-body 38 fo:repeatable-page-master-alternatives 12 fo:root 1 fo:simple-page-master 38 fo:static-content 4720 fo:table 6497 fo:table-body 6497 fo:table-cell 33174 fo:table-column 19225 fo:table-footer 1 fo:table-header 29 fo:table-row 15301 fo:wrapper 1799 Properties: 526648 Tags: 285223 num_keys: 117 num_vals: 13520 -- John Austin [EMAIL PROTECTED]
Re: Properties Implementation and Canonical Mappings
--- John Austin [EMAIL PROTECTED] wrote: The property strings are given to the Property object constructor by some path beginning with a SAX parser. It is reasonable to assume that the SAX parser loses refs to most of these strings and that the Property implementation retains the only references to these String objects. How big are String Objects ? At least 16 bytes plus storage for characters. What does this save us ? Probably only about 1,600,000 bytes for this file. CPU cost of creating strings is probably similar to cost of checking string table for a copy. Just to clarify, the (additional?) CPU cost you mentioning above is *not* occurring for the present process, correct? I think you're referring to the cost that would be added as a result of the changes you're recommending (because there now will be a string table search to avoid duplication). Also, the string table you mention--I think you're speaking generically, but is there a specific, already available construct in Java that we can use for this purpose in FOP? I'd like to find out what you have in mind for a specific implementation. Thanks, Glen __ Do you Yahoo!? Free Pop-Up Blocker - Get it now http://companion.yahoo.com/
Properties Implementation and Canonical Mappings
In the interest of contributing (instead of just trashing) to the proposed implementation, I wrote a simple Perl script to get some counts out of a real-world XSL-FO file. Input: The XSL-FO file produced from a DocBook file I have left from a dormant project. The perl program counts the number of properties in the source file. PDF size: 130 Pages // some users have a lot more FO file size: 1.2M bytes Properties: 22,815 Unique prop names: 89 // bounded by the spec Unique prop values: 2,227 // bounded by the real world Note that storing the property name and value refs supplied to the Property constructor will use 45,620 strings. If the Property implementation employs canonical mapping to ensure that only one copy of each unique string is stored, then just over 2,300 strings are required. The property strings are given to the Property object constructor by some path beginning with a SAX parser. It is reasonable to assume that the SAX parser loses refs to most of these strings and that the Property implementation retains the only references to these String objects. How big are String Objects ? At least 16 bytes plus storage for characters. What does this save us ? Probably only about 1,600,000 bytes for this file. CPU cost of creating strings is probably similar to cost of checking string table for a copy. What does it buy for us ? Bounds a source of current Order(n) memory growth. It gets us in the habit of using another good technique. I am all ready thinking along the lines of: The property lists for these FO's are usually generated by programs and will be the repeated many times. Perhaps we could use larger, faster working Property Lists consolidated with Canonical Mappings to save both time and space. I am thinking again along the lines of handling properties more like C++ virtual function table (vTable). This object is larger than Peter's ordered Property array, but would be faster. That's a reason C++ has fast virtual function dispatching. -- John Austin [EMAIL PROTECTED]
Re: Properties Implementation and Canonical Mappings
John Austin wrote: In the interest of contributing (instead of just trashing) to the proposed implementation, I wrote a simple Perl script to get some counts out of a real-world XSL-FO file. Input: The XSL-FO file produced from a DocBook file I have left from a dormant project. The perl program counts the number of properties in the source file. PDF size: 130 Pages // some users have a lot more FO file size: 1.2M bytes Properties: 22,815 Unique prop names: 89 // bounded by the spec Unique prop values: 2,227 // bounded by the real world Note that storing the property name and value refs supplied to the Property constructor will use 45,620 strings. If the Property implementation employs canonical mapping to ensure that only one copy of each unique string is stored, then just over 2,300 strings are required. The property strings are given to the Property object constructor by some path beginning with a SAX parser. It is reasonable to assume that the SAX parser loses refs to most of these strings and that the Property implementation retains the only references to these String objects. How big are String Objects ? At least 16 bytes plus storage for characters. What does this save us ? Probably only about 1,600,000 bytes for this file. CPU cost of creating strings is probably similar to cost of checking string table for a copy. What does it buy for us ? Bounds a source of current Order(n) memory growth. It gets us in the habit of using another good technique. I am all ready thinking along the lines of: The property lists for these FO's are usually generated by programs and will be the repeated many times. Perhaps we could use larger, faster working Property Lists consolidated with Canonical Mappings to save both time and space. I am thinking again along the lines of handling properties more like C++ virtual function table (vTable). This object is larger than Peter's ordered Property array, but would be faster. That's a reason C++ has fast virtual function dispatching.
Re: Properties Implementation and Canonical Mappings
Darn, racall the last post. John Austin wrote: Note that storing the property name and value refs supplied to the Property constructor will use 45,620 strings. If the Property implementation employs canonical mapping to ensure that only one copy of each unique string is stored, then just over 2,300 strings are required. Have a look at String.intern() J.Pietschmann
Re: Properties Implementation and Canonical Mappings
On Sat, 2003-11-29 at 16:35, J.Pietschmann wrote: Darn, racall the last post. John Austin wrote: Note that storing the property name and value refs supplied to the Property constructor will use 45,620 strings. If the Property implementation employs canonical mapping to ensure that only one copy of each unique string is stored, then just over 2,300 strings are required. Have a look at String.intern() Bruce Eckel said not to trust it for some reason. I have 2nd Ed of Thinking in Java and the online one is 3rd Ed so I haven't found chapter and verse for this yet. The only 'bad thing' said about it that I could find quickly was: http://mindprod.com/jgloss/gotchas.html The other good thing we can do is compare these string refs for equality. J.Pietschmann -- John Austin [EMAIL PROTECTED]