Re: Big/Huge XMLs

2003-05-24 Thread Rob Stote
Dunno, Mat something else must be going on I can run it fine on my box. I
noticed you are using the  fo:page-sequence master-reference=one tag .
Did you upgrade to at least this version 0.20.4. I can runit fine with this
version and above (see below the 2 ouputs). I can run it both embedded
(0.20.4), and from the command line (0.20.5rc3).

result is attached

Command Line:
C:\fopfop -d ReportOutput.fo report.pdf
[DEBUG] Input mode:
[DEBUG] FO
.
[INFO] Using org.apache.xerces.parsers.SAXParser as SAX2 Parser
[INFO] FOP 0.20.5rc3a
[INFO] Using org.apache.xerces.parsers.SAXParser as SAX2 Parser
[INFO] building formatting object tree
[INFO] setting up fonts
[INFO] [1]
[WARNING] table-layout=auto is not supported, using fixed!
...
[WARNING] Sum of fixed column widths 521574 greater than maximum specified
IPD 3
9685
[INFO] [2]

[INFO] [6]
[DEBUG] Last page-sequence produced 8 pages.
[INFO] Parsing of document complete, stopping renderer
[DEBUG] Initial heap size: 636Kb
[DEBUG] Current heap size: 12794Kb
[DEBUG] Total memory used: 12158Kb
[DEBUG]   Memory use is indicative; no GC was performed
[DEBUG]   These figures should not be used comparatively
[DEBUG] Total time used: 4998ms
[DEBUG] Pages rendered: 8
[DEBUG] Avg render time: 624ms/page

Embedded:
[ERROR] Logger not set
[WARNING] Screen logger not set.
[INFO] building formatting object tree
[INFO] [1]
[WARNING] table-layout=auto is not supported, using fixed!
[WARNING] table-layout=auto is not supported, using fixed!
[WARNING] table-layout=auto is not supported, using fixed!
[WARNING] table-layout=auto is not supported, using fixed!
[WARNING] Sum of fixed column widths 521574 greater than maximum specified
IPD 39685
[INFO] [2]
[INFO] [3]
[INFO] [4]
[INFO] [5]
[INFO] [6]
[INFO] [7]
[INFO] [8]
[INFO] Parsing of document complete, stopping renderer

Rob.
- Original Message -
From: Savino, Matt C [EMAIL PROTECTED]
To: [EMAIL PROTECTED]
Sent: Friday, May 23, 2003 5:16 PM
Subject: RE: Big/Huge XMLs


Wow. Thanks for the extremely throrough investigation. This has me wondering
if I use region-before for by column headers, could I just break the table
every 100 rows or so?

So I started working on it, but now I'm stumped. For some reason the
attacehd FO, which looks fine to me is crashing not only FOP, but Weblogic
entirely, with no warning at all. Can anyone see why? Very very strange.

-Matt



 -Original Message-
 From: J.Pietschmann [mailto:[EMAIL PROTECTED]
 Sent: Friday, May 23, 2003 11:43 AM
 To: [EMAIL PROTECTED]
 Subject: Re: Big/Huge XMLs


 Savino, Matt C wrote:
  Below is the log output for a slightly larger (66 page) report...

 Reports are the root of all evil, oh well.

 Your small attachment expands to an impressive 4MB file, which
 contains a single table with roughly 30'000 cells. It ultimately run
 out of memory around page 4 on my JDK 1.3.1 -Xmx64M on WinNT.  The FO
 tree for the file soaks up a good chunk of the allocated memory,
 according to Jochen Wiedmann's Dr.Mem memory profiler:

 bytes class
   4440832 org.apache.fop.fo.flow.Block
   4376280 org.apache.fop.fo.flow.TableCell
   3964032 org.apache.fop.fo.PropertyList
   2973024 org.apache.fop.fo.PropertyManager
   2355840 org.apache.fop.fo.FOText
703200 org.apache.fop.fo.LengthProperty
468640 org.apache.fop.datatypes.FixedLength
163296 org.apache.fop.datatypes.KeepValue
438912 org.apache.fop.fo.flow.TableRow
 81024 org.apache.fop.datatypes.Keep
 81024 org.apache.fop.fo.KeepProperty
 54432 org.apache.fop.fo.flow.TableRow$CellArray
 
 20100536 bytes

 Another 18MB of java base objects like HashMap also contribute quite a
 bit. This means that memory is already pretty tight before the layout
 process even starts. I also the repeated font-size=8pt and
 text-align=start causes some bloat, and deleting them reduced the
 overall number of created objects by 10%. However, the effect on run
 time was neglible.

 Increasing the mx setting resulted in memory thrashing :-/.

 A closer look at the memory profiler statistics showed that all the
 layout data associated with table cells still hung around in memory at
 the time memory runs out. Digging further this turned out to be caused
 by table objects clinging to their layout data indefinitely. That's
 bad. I put in a small hack, using the area's back pointer to release
 the data for Table, AbstractTableBody, TableRow and TableCell after
 rendering. This allowed me to render the file, albeit slowly due to
 frequent GC. Unfortunately I'm reluctant to commit the change because
 it is likely to break some things, in particular putting IDs on a
 table will cause trouble in certain situations.  This could probably
 be fixed too, but the more the scope of the code change is broadened
 the more testing would be necessary before the next release.

 Conclusion: don't use tables, as they lock up a lot of memory until
 the page sequence ends. If you have to use tables, use short page
 sequences

Re: Big/Huge XMLs

2003-05-24 Thread J.Pietschmann
Savino, Matt C wrote:
Wow. Thanks for the extremely throrough investigation. This has me wondering
if I use region-before for by column headers, could I just break the table
every 100 rows or so?
It was my first thought to recommend exactly this, but it wont
help. The various table FOs keep references to the generated areas,
which is the cause of memory filling up, and they will stay in
memory until the FOs are GC'd, which is after layout of the whole
containing page sequence is complete. In my 1000 page books I have
tables too, but they are much shorter (14pt font instead of 8pt,
and NO CAPS :-), and they have much less cells, therefore I never
encountered the problem.
-Original Message-
Uh, Outlook in action.
J.Pietschmann
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: Big/Huge XMLs

2003-05-23 Thread J.Pietschmann
Savino, Matt C wrote:
Below is the log output for a slightly larger (66 page) report...
Reports are the root of all evil, oh well.
Your small attachment expands to an impressive 4MB file, which
contains a single table with roughly 30'000 cells. It ultimately run
out of memory around page 4 on my JDK 1.3.1 -Xmx64M on WinNT.  The FO
tree for the file soaks up a good chunk of the allocated memory,
according to Jochen Wiedmann's Dr.Mem memory profiler:
   bytes class
 4440832 org.apache.fop.fo.flow.Block
 4376280 org.apache.fop.fo.flow.TableCell
 3964032 org.apache.fop.fo.PropertyList
 2973024 org.apache.fop.fo.PropertyManager
 2355840 org.apache.fop.fo.FOText
  703200 org.apache.fop.fo.LengthProperty
  468640 org.apache.fop.datatypes.FixedLength
  163296 org.apache.fop.datatypes.KeepValue
  438912 org.apache.fop.fo.flow.TableRow
   81024 org.apache.fop.datatypes.Keep
   81024 org.apache.fop.fo.KeepProperty
   54432 org.apache.fop.fo.flow.TableRow$CellArray

20100536 bytes
Another 18MB of java base objects like HashMap also contribute quite a
bit. This means that memory is already pretty tight before the layout
process even starts. I also the repeated font-size=8pt and
text-align=start causes some bloat, and deleting them reduced the
overall number of created objects by 10%. However, the effect on run
time was neglible.
Increasing the mx setting resulted in memory thrashing :-/.
A closer look at the memory profiler statistics showed that all the
layout data associated with table cells still hung around in memory at
the time memory runs out. Digging further this turned out to be caused
by table objects clinging to their layout data indefinitely. That's
bad. I put in a small hack, using the area's back pointer to release
the data for Table, AbstractTableBody, TableRow and TableCell after
rendering. This allowed me to render the file, albeit slowly due to
frequent GC. Unfortunately I'm reluctant to commit the change because
it is likely to break some things, in particular putting IDs on a
table will cause trouble in certain situations.  This could probably
be fixed too, but the more the scope of the code change is broadened
the more testing would be necessary before the next release.
Conclusion: don't use tables, as they lock up a lot of memory until
the page sequence ends. If you have to use tables, use short page
sequences.
Some additional notes: the most objects created in total were of class
String (3'738'853), which is no surprise, followed by java.lang.Object
(571'663), ArrayList (389'319), HashMap$Entry(278'527) and HashMap
(122'702), which *is* a bit of a surprise. I guess the j.l.Object
counts the various arrays. Hashmap entries, lists and the objects
tended to be fairly persistent, with more than 100'000 lists and
objects as well as 135'000 hashmap entries still being referenced at
the end of the run. The hashmaps itself are less likely to be kept,
only 7'500 out of a total of 122'000 have been left. This means the
persistently referenced hashmaps keep an average of 13 entries, while
the overall average is 4.7 entries per hash map. A few maps with lots
of entries, like the ones holding the mapping from element names to
element factories may account for much of the difference. OTOH the low
overall average indicates that many of the hashmaps stay empty, there
ought to be quite some potential for optimization. I suspect many of
the lists stay empty too. Does anybody have ideas or knowledge of
tools which allow more detailed investigations of this kind of issues?
J.Pietschmann
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


RE: Big/Huge XMLs

2003-05-23 Thread Savino, Matt C
Wow. Thanks for the extremely throrough investigation. This has me wondering if 
I use region-before for by column headers, could I just break the table every 
100 rows or so?

So I started working on it, but now I'm stumped. For some reason the attacehd 
FO, which looks fine to me is crashing not only FOP, but Weblogic entirely, 
with no warning at all. Can anyone see why? Very very strange.

-Matt



 -Original Message-
 From: J.Pietschmann [mailto:[EMAIL PROTECTED]
 Sent: Friday, May 23, 2003 11:43 AM
 To: [EMAIL PROTECTED]
 Subject: Re: Big/Huge XMLs
 
 
 Savino, Matt C wrote:
  Below is the log output for a slightly larger (66 page) report...
 
 Reports are the root of all evil, oh well.
 
 Your small attachment expands to an impressive 4MB file, which
 contains a single table with roughly 30'000 cells. It ultimately run
 out of memory around page 4 on my JDK 1.3.1 -Xmx64M on WinNT.  The FO
 tree for the file soaks up a good chunk of the allocated memory,
 according to Jochen Wiedmann's Dr.Mem memory profiler:
 
 bytes class
   4440832 org.apache.fop.fo.flow.Block
   4376280 org.apache.fop.fo.flow.TableCell
   3964032 org.apache.fop.fo.PropertyList
   2973024 org.apache.fop.fo.PropertyManager
   2355840 org.apache.fop.fo.FOText
703200 org.apache.fop.fo.LengthProperty
468640 org.apache.fop.datatypes.FixedLength
163296 org.apache.fop.datatypes.KeepValue
438912 org.apache.fop.fo.flow.TableRow
 81024 org.apache.fop.datatypes.Keep
 81024 org.apache.fop.fo.KeepProperty
 54432 org.apache.fop.fo.flow.TableRow$CellArray
 
 20100536 bytes
 
 Another 18MB of java base objects like HashMap also contribute quite a
 bit. This means that memory is already pretty tight before the layout
 process even starts. I also the repeated font-size=8pt and
 text-align=start causes some bloat, and deleting them reduced the
 overall number of created objects by 10%. However, the effect on run
 time was neglible.
 
 Increasing the mx setting resulted in memory thrashing :-/.
 
 A closer look at the memory profiler statistics showed that all the
 layout data associated with table cells still hung around in memory at
 the time memory runs out. Digging further this turned out to be caused
 by table objects clinging to their layout data indefinitely. That's
 bad. I put in a small hack, using the area's back pointer to release
 the data for Table, AbstractTableBody, TableRow and TableCell after
 rendering. This allowed me to render the file, albeit slowly due to
 frequent GC. Unfortunately I'm reluctant to commit the change because
 it is likely to break some things, in particular putting IDs on a
 table will cause trouble in certain situations.  This could probably
 be fixed too, but the more the scope of the code change is broadened
 the more testing would be necessary before the next release.
 
 Conclusion: don't use tables, as they lock up a lot of memory until
 the page sequence ends. If you have to use tables, use short page
 sequences.
 
 Some additional notes: the most objects created in total were of class
 String (3'738'853), which is no surprise, followed by java.lang.Object
 (571'663), ArrayList (389'319), HashMap$Entry(278'527) and HashMap
 (122'702), which *is* a bit of a surprise. I guess the j.l.Object
 counts the various arrays. Hashmap entries, lists and the objects
 tended to be fairly persistent, with more than 100'000 lists and
 objects as well as 135'000 hashmap entries still being referenced at
 the end of the run. The hashmaps itself are less likely to be kept,
 only 7'500 out of a total of 122'000 have been left. This means the
 persistently referenced hashmaps keep an average of 13 entries, while
 the overall average is 4.7 entries per hash map. A few maps with lots
 of entries, like the ones holding the mapping from element names to
 element factories may account for much of the difference. OTOH the low
 overall average indicates that many of the hashmaps stay empty, there
 ought to be quite some potential for optimization. I suspect many of
 the lists stay empty too. Does anybody have ideas or knowledge of
 tools which allow more detailed investigations of this kind of issues?
 
 J.Pietschmann
 
 
 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]
 
 
 
attachment: ReportOutput.zip
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

RE: Big/Huge XMLs

2003-05-22 Thread Savino, Matt C



[I'll give J. a 
breather on this one] Assuming you've made certain that the bottleneck is in FOP 
and not your XSLT transformation, the only thing you can really do to help with 
very large reports is to break the PDFs into multiple page-sequences. IE start a 
new page-sequence every X nodes. If you have natural page break that you can 
use, no one will ever notice in the final output. Otherwise you may have awkward 
page breaks in your document. I have attached an XSLT stylesheet that we use to 
essentially break theXSL:FO into "chunks". Every 10 Investigator (=doctor) 
nodes we start a new page-sequence. Since this report has a page breakfor 
each new investigator anyway, the end result is no 
different.


We increased our 
max PDF size on this report from 30 pages to 200 using this method, and 
seriously sped up rendering time for large reports. FYI - weare 
runningWeblogic on HP-UX with -hotspot, max-heap-size=512MB. For some 
reason, the Wintel JVM seems to perform a lot better than HP-UX on FOP, and in 
general.
Note: as originally pointed out on this board, 
there is actually a slicker, and probably more efficient, way to do the node 
processing using a recurring template. ButI just found out how to do that 
and our bottleneck isn't in the XSLT so I haven't had any motiviation to 
go back and fix it. Here's the example: http://www.dpawson.co.uk/xsl/sect2/N4486.html#d4085e94I 
like Steve Tinney's solution to the grouping problem.

Hope this 
helps,
Matt


  -Original Message-From: Mohit Sharma 
  [mailto:[EMAIL PROTECTED]Sent: Wednesday, May 21, 2003 6:34 
  PMTo: [EMAIL PROTECTED]Subject: Big/Huge 
  XMLs
  I havebig/huge XMLs, and I need 
  to convertthem intoPDFs using FOP.Benchmarking the latest 
  FOP gives poor results, both memory-wise andprocessing-wise. Its just 
  taking too much time. The XML cannot really be broken down into chunks, as its 
  all part of a report. And I need to process a lot of reports overnight, and I 
  don't have a cluster at my disposal to distribute the load.
  
  Is there a way to speed up the processing time ?
  
  
  Best,
  Mohit Sharma


pvh_XmlToFo.xsl
Description: pvh_XmlToFo.xsl
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Big/Huge XMLs

2003-05-22 Thread Jeremias Maerki
In addition to Matt's comments there is: 
http://xml.apache.org/fop/running.html#memory

Also keep in mind that very high memory consumption often comes with a
decrease in speed for Java. So reducing memory consumption may improve
speed. Also avoid building a DOM for input. I you do that try changing
to SAX event generation.

On 22.05.2003 03:34:06 Mohit Sharma wrote:
 I have big/huge XMLs, and I need to convert them into PDFs using FOP.
 Benchmarking the latest FOP gives poor results, both memory-wise and
 processing-wise. Its just taking too much time. The XML cannot really be
 broken down into chunks, as its all part of a report. And I need to
 process a lot of reports overnight, and I don't have a cluster at my
 disposal to distribute the load.
 
 Is there a way to speed up the processing time ?


Jeremias Maerki


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Big/Huge XMLs

2003-05-22 Thread J.Pietschmann
Savino, Matt C wrote:
We increased our max PDF size on this report from 30 pages to 200
Huh? What complications do you add to the layout to run out of
memory at only *30* pages? I never had any problems until I got
well past 1000 pages (using -mx128M, JDK 1.3.1)
J.Pietschmann

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


RE: Big/Huge XMLs

2003-05-22 Thread Savino, Matt C
30 pages is not our absolute max, but we have set a requirement that we have to 
be able to handle at least two concurrent reports, so we need some cushion. I 
have attached some FO for a report that runs around 50 pages, using only one 
page-sequence. I would really appreciate it if you coule run it through your 
FOP processor and tell me what kind of performance you see. It runs on our 
system, but two of these at once will come very close to an out of memory 
error. I have a feeling the large performance discrepancy may be because our 
report is one large table. Please let me know what you find. 


Below is the log output for a slightly larger (66 page) report also using only 
one page-sequence. I had to do some de-indefication to create the report I 
attached. But this log output is from the exact same report, just slightly 
different data. As you can see at -mx256m, we come close to an out of memory 
error on just this report. Once we implemented the page-sequence chunking 
though, full garbage collection knocks memory-in-use down to the range of 
25-75MB depending on the report. This is a big improvement over the 215MB that 
you still see in use below even after a full GC. This was run on my Windows dev 
box (PIII 850Mhz 512MB ram) set to -mx256m. Memory performance and speed are 
always much worse on our production boxes( HP-UX, 2x550 RISC, 2GB ram) set to 
-mx512M -- especially on 2 or more concurrent reports. We're trying to talk our 
infrastructure police into letting us set up a Win2k server with Weblogic as a 
dedicated PDF report generator. No luck yet.
 

[GC 41503K-39768K(216920K), 0.0604714 secs]
[GC 41816K-40081K(216920K), 0.0734613 secs]

(... FOP processing begins below ...)

building formatting object tree
setting up fonts
[GC 42128K-40544K(216920K), 0.0598146 secs]
[GC 42592K-41074K(216920K), 0.0308461 secs]

(... many more partial garbage collections ...)

[GC 113914K-112394K(216920K), 0.0217497 secs]
[GC 114442K-112925K(216920K), 0.077 secs]
[GC 114973K-113455K(216920K), 0.0222500 secs]
[GC 115503K-113983K(216920K), 0.0219170 secs]

(... FOP is done pre-processing, page generation begins ...)

 [1[GC 116031K-114464K(216920K), 0.0214575 secs]
[GC 116512K-114883K(216920K), 0.0216715 secs]
[GC 116931K-115305K(216920K), 0.0224520 secs]
[GC 117353K-115722K(216920K), 0.0233072 secs]
] [2[GC 117770K-116113K(216920K), 0.0208135 secs]
[GC 118161K-116531K(216920K), 0.0223947 secs]
[GC 118579K-116951K(216920K), 0.0220651 secs]
][GC 118999K-117329K(216920K), 0.0211273 secs]
 [3[GC 119377K-117768K(216920K), 0.0220210 secs]
[GC 119816K-118186K(216920K), 0.0219897 secs]
[GC 120234K-118606K(216920K), 0.0223495 secs]

(... pages 4-62 output and steady partial garbage collection ...)

 [63[GC 212110K-210967K(216920K), 0.0201827 secs]
[GC 213011K-211792K(216920K), 0.0224984 secs]
[GC 213370K-212929K(216920K), 0.0238419 secs]
[GC 214977K-214223K(216920K), 0.0349902 secs]
[GC 216271K-215075K(217464K), 0.0221023 secs]

(... full garbage collection, at this point we know most memory in use is for 
FOP ...) 
(... IE - normal full garbage collection brings memory in use down to ~20MB 
   ...)
(... So at -mx256m, FOP is ~40MB away from an out of memory error   
  ...)

[Full GC 217123K-215866K(261888K), 5.5263608 secs]
[GC 217207K-216127K(261888K), 1.4741779 secs]
] [64[GC 218175K-216477K(261888K), 0.0265551 secs]
[GC 218525K-216896K(261888K), 0.0246568 secs]
[GC 218944K-217318K(261888K), 0.0252940 secs]
][GC 219366K-217700K(261888K), 0.0236608 secs]
 [65[GC 219748K-218130K(261888K), 0.0244581 secs]
[GC 220178K-218552K(261888K), 0.0248638 secs]
[GC 220600K-218971K(261888K), 0.0303567 secs]
] [66[GC 221019K-219361K(261888K), 0.0230088 secs]
]
Parsing of document complete, stopping renderer
Initial heap size: 40908Kb
Current heap size: 219991Kb
Total memory used: 179083Kb
  Memory use is indicative; no GC was performed
  These figures should not be used comparatively
Total time used: 66075ms
Pages rendererd: 66
Avg render time: 1001ms/page






 -Original Message-
 From: J.Pietschmann [mailto:[EMAIL PROTECTED]
 Sent: Thursday, May 22, 2003 12:35 PM
 To: [EMAIL PROTECTED]
 Subject: Re: Big/Huge XMLs
 
 
 Savino, Matt C wrote:
  We increased our max PDF size on this report from 30 pages to 200
 
 Huh? What complications do you add to the layout to run out of
 memory at only *30* pages? I never had any problems until I got
 well past 1000 pages (using -mx128M, JDK 1.3.1)
 
 J.Pietschmann
 
 
 
 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]
 
 
 
attachment: NAReportOutput.zip
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]