Re: Properties Implementation and Canonical Mappings

2003-12-01 Thread John Austin
On Mon, 2003-12-01 at 02:45, Glen Mazza wrote:
 --- John Austin [EMAIL PROTECTED] wrote:
  
  The property strings are given to the Property
  object
  constructor by some path beginning with a SAX
  parser.
  It is reasonable to assume that the SAX parser loses
  refs to most of these strings and that the Property
  implementation retains the only references to these 
  String objects.
  
  How big are String Objects ? 
  At least 16 bytes plus storage for characters. 
  
  What does this save us ? 
  Probably only about 1,600,000 bytes for this file. 
  CPU cost of creating strings is probably similar to 
  cost of checking string table for a copy.
  
 
 Just to clarify, the (additional?) CPU cost you
 mentioning above is *not* occurring for the present
 process, correct?  I think you're referring to the
 cost that would be added as a result of the changes
 you're recommending (because there now will be a
 string table search to avoid duplication).

Going back to the beginning of my involvement, I found this
issue because Property searches are the high-runner for CPU
in FOP. I don't want to split hairs in isolation over which
search/constructor sequence is faster. I want to remove the
conditions that cause the current pathology.

Hash table lookups are FAST. When we invest in object creation
we recover many times over in the end. 

 Also, the string table you mention--I think you're
 speaking generically, but is there a specific, already
 available construct in Java that we can use for this
 purpose in FOP?  I'd like to find out what you have in
 mind for a specific implementation.

HashMap works fine the way Peter has it set up in alt-design.

I use the same construct in the Perl code I use to analyze the
large sample FO files.

-- 
John Austin [EMAIL PROTECTED]


Re: Properties Implementation and Canonical Mappings

2003-11-30 Thread John Austin
Input: The XSL-FO file produced from:
DocBook: The Definitive Guide 

Document size:   648 Pages  // for the O'Reilly edition
FO file size:   21,659,370 bytes
Properties: 526,648
Tags:   285,223
Height of tree: 17   // max height of the parse tree
Unique prop names:  117  // bounded by the spec
Unique prop values: 13,520   // bounded by the real world

Using these numbers, we can explore the sort of benefits to expect
from revised Property implementation. With over a million strings,
the FOTree for this document would use forty or fifty Mb in addition
to data structures. 

This document can be used as an example even though it probably
can't be formatted (yet) by FOP. It has a lot of tables. It could 
be a goal of the FOP project to generate this well-known document.

I was thinking of using the XSL-FO spec from the W3C web site but
couldn't find the stylesheet to make the FO file. If anyone knows
where to find them, please let me know.

Statistics from this file:

Number of Elements by tree level:
level=1 count=1
level=2 count=473
level=3 count=5242
level=4 count=5480
level=5 count=7129
level=6 count=26231
level=7 count=22475
level=8 count=36447
level=9 count=62288
level=10 count=38536
level=11 count=30486
level=12 count=23641
level=13 count=23190
level=14 count=2023
level=15 count=771
level=16 count=701
level=17 count=109

Element frequencies:
a 24 I wonder where this came from 
fo:basic-link 5225
fo:block 112142
fo:conditional-page-master-reference 48
fo:external-graphic 1097
fo:flow 472
fo:footnote 22
fo:footnote-body 22
fo:inline 62792
fo:layout-master-set 1
fo:leader 1764
fo:list-block 279
fo:list-item 1004
fo:list-item-body 1004
fo:list-item-label 1004
fo:marker 5335
fo:page-number 1872
fo:page-number-citation 3224
fo:page-sequence 472
fo:page-sequence-master 12
fo:region-after 38
fo:region-before 38
fo:region-body 38
fo:repeatable-page-master-alternatives 12
fo:root 1
fo:simple-page-master 38
fo:static-content 4720
fo:table 6497
fo:table-body 6497
fo:table-cell 33174
fo:table-column 19225
fo:table-footer 1
fo:table-header 29
fo:table-row 15301
fo:wrapper 1799

Properties: 526648
Tags: 285223
num_keys: 117
num_vals: 13520


-- 

John Austin [EMAIL PROTECTED]


Re: Properties Implementation and Canonical Mappings

2003-11-30 Thread Glen Mazza
--- John Austin [EMAIL PROTECTED] wrote:
 
 The property strings are given to the Property
 object
 constructor by some path beginning with a SAX
 parser.
 It is reasonable to assume that the SAX parser loses
 refs to most of these strings and that the Property
 implementation retains the only references to these 
 String objects.
 
 How big are String Objects ? 
 At least 16 bytes plus storage for characters. 
 
 What does this save us ? 
 Probably only about 1,600,000 bytes for this file. 
 CPU cost of creating strings is probably similar to 
 cost of checking string table for a copy.
 

Just to clarify, the (additional?) CPU cost you
mentioning above is *not* occurring for the present
process, correct?  I think you're referring to the
cost that would be added as a result of the changes
you're recommending (because there now will be a
string table search to avoid duplication).

Also, the string table you mention--I think you're
speaking generically, but is there a specific, already
available construct in Java that we can use for this
purpose in FOP?  I'd like to find out what you have in
mind for a specific implementation.

Thanks,
Glen

__
Do you Yahoo!?
Free Pop-Up Blocker - Get it now
http://companion.yahoo.com/


Properties Implementation and Canonical Mappings

2003-11-29 Thread John Austin
In the interest of contributing (instead of just
trashing) to the proposed implementation, I wrote 
a simple Perl script to get some counts out of a 
real-world XSL-FO file.

Input: The XSL-FO file produced from a DocBook file
I have left from a dormant project. The perl program 
counts the number of properties in the source file.

PDF size:   130 Pages  // some users have a lot more
FO file size:   1.2M bytes
Properties: 22,815
Unique prop names:  89  // bounded by the spec
Unique prop values: 2,227   // bounded by the real world

Note that storing the property name and value refs supplied
to the Property constructor will use 45,620 strings. If the
Property implementation employs canonical mapping to ensure
that only one copy of each unique string is stored, then just
over 2,300 strings are required. 

The property strings are given to the Property object
constructor by some path beginning with a SAX parser.
It is reasonable to assume that the SAX parser loses
refs to most of these strings and that the Property
implementation retains the only references to these 
String objects.

How big are String Objects ? 
At least 16 bytes plus storage for characters. 

What does this save us ? 
Probably only about 1,600,000 bytes for this file. 
CPU cost of creating strings is probably similar to 
cost of checking string table for a copy.

What does it buy for us ?
Bounds a source of current Order(n) memory growth. 
It gets us in the habit of using another good technique.

I am all ready thinking along the lines of:
The property lists for these FO's are usually generated by
programs and will be the repeated many times. Perhaps we
could use larger, faster working Property Lists consolidated with
Canonical Mappings to save both time and space.

I am thinking again along the lines of handling properties more
like C++ virtual function table (vTable). This object is larger
than Peter's ordered Property array, but would be faster. 
That's a reason C++ has fast virtual function dispatching.
-- 
John Austin [EMAIL PROTECTED]


Re: Properties Implementation and Canonical Mappings

2003-11-29 Thread J.Pietschmann
John Austin wrote:
In the interest of contributing (instead of just
trashing) to the proposed implementation, I wrote 
a simple Perl script to get some counts out of a 
real-world XSL-FO file.

Input: The XSL-FO file produced from a DocBook file
I have left from a dormant project. The perl program 
counts the number of properties in the source file.

PDF size:   130 Pages  // some users have a lot more
FO file size:   1.2M bytes
Properties: 22,815
Unique prop names:  89  // bounded by the spec
Unique prop values: 2,227   // bounded by the real world
Note that storing the property name and value refs supplied
to the Property constructor will use 45,620 strings. If the
Property implementation employs canonical mapping to ensure
that only one copy of each unique string is stored, then just
over 2,300 strings are required. 

The property strings are given to the Property object
constructor by some path beginning with a SAX parser.
It is reasonable to assume that the SAX parser loses
refs to most of these strings and that the Property
implementation retains the only references to these 
String objects.

How big are String Objects ? 
At least 16 bytes plus storage for characters. 

What does this save us ? 
Probably only about 1,600,000 bytes for this file. 
CPU cost of creating strings is probably similar to 
cost of checking string table for a copy.

What does it buy for us ?
Bounds a source of current Order(n) memory growth. 
It gets us in the habit of using another good technique.

I am all ready thinking along the lines of:
The property lists for these FO's are usually generated by
programs and will be the repeated many times. Perhaps we
could use larger, faster working Property Lists consolidated with
Canonical Mappings to save both time and space.
I am thinking again along the lines of handling properties more
like C++ virtual function table (vTable). This object is larger
than Peter's ordered Property array, but would be faster. 
That's a reason C++ has fast virtual function dispatching.




Re: Properties Implementation and Canonical Mappings

2003-11-29 Thread J.Pietschmann
Darn, racall the last post.

John Austin wrote:
Note that storing the property name and value refs supplied
to the Property constructor will use 45,620 strings. If the
Property implementation employs canonical mapping to ensure
that only one copy of each unique string is stored, then just
over 2,300 strings are required. 
Have a look at String.intern()

J.Pietschmann




Re: Properties Implementation and Canonical Mappings

2003-11-29 Thread John Austin
On Sat, 2003-11-29 at 16:35, J.Pietschmann wrote:
 Darn, racall the last post.
 
 John Austin wrote:
  Note that storing the property name and value refs supplied
  to the Property constructor will use 45,620 strings. If the
  Property implementation employs canonical mapping to ensure
  that only one copy of each unique string is stored, then just
  over 2,300 strings are required. 
 
 Have a look at String.intern()

Bruce Eckel said not to trust it for some reason. I have 2nd Ed
of Thinking in Java and the online one is 3rd Ed so I haven't
found chapter and verse for this yet. 

The only 'bad thing' said about it that I could find quickly was:

http://mindprod.com/jgloss/gotchas.html

The other good thing we can do is  compare these string refs for
equality.


 J.Pietschmann
-- 
John Austin [EMAIL PROTECTED]