Re: Big/Huge XMLs
Dunno, Mat something else must be going on I can run it fine on my box. I noticed you are using the fo:page-sequence master-reference=one tag . Did you upgrade to at least this version 0.20.4. I can runit fine with this version and above (see below the 2 ouputs). I can run it both embedded (0.20.4), and from the command line (0.20.5rc3). result is attached Command Line: C:\fopfop -d ReportOutput.fo report.pdf [DEBUG] Input mode: [DEBUG] FO . [INFO] Using org.apache.xerces.parsers.SAXParser as SAX2 Parser [INFO] FOP 0.20.5rc3a [INFO] Using org.apache.xerces.parsers.SAXParser as SAX2 Parser [INFO] building formatting object tree [INFO] setting up fonts [INFO] [1] [WARNING] table-layout=auto is not supported, using fixed! ... [WARNING] Sum of fixed column widths 521574 greater than maximum specified IPD 3 9685 [INFO] [2] [INFO] [6] [DEBUG] Last page-sequence produced 8 pages. [INFO] Parsing of document complete, stopping renderer [DEBUG] Initial heap size: 636Kb [DEBUG] Current heap size: 12794Kb [DEBUG] Total memory used: 12158Kb [DEBUG] Memory use is indicative; no GC was performed [DEBUG] These figures should not be used comparatively [DEBUG] Total time used: 4998ms [DEBUG] Pages rendered: 8 [DEBUG] Avg render time: 624ms/page Embedded: [ERROR] Logger not set [WARNING] Screen logger not set. [INFO] building formatting object tree [INFO] [1] [WARNING] table-layout=auto is not supported, using fixed! [WARNING] table-layout=auto is not supported, using fixed! [WARNING] table-layout=auto is not supported, using fixed! [WARNING] table-layout=auto is not supported, using fixed! [WARNING] Sum of fixed column widths 521574 greater than maximum specified IPD 39685 [INFO] [2] [INFO] [3] [INFO] [4] [INFO] [5] [INFO] [6] [INFO] [7] [INFO] [8] [INFO] Parsing of document complete, stopping renderer Rob. - Original Message - From: Savino, Matt C [EMAIL PROTECTED] To: [EMAIL PROTECTED] Sent: Friday, May 23, 2003 5:16 PM Subject: RE: Big/Huge XMLs Wow. Thanks for the extremely throrough investigation. This has me wondering if I use region-before for by column headers, could I just break the table every 100 rows or so? So I started working on it, but now I'm stumped. For some reason the attacehd FO, which looks fine to me is crashing not only FOP, but Weblogic entirely, with no warning at all. Can anyone see why? Very very strange. -Matt -Original Message- From: J.Pietschmann [mailto:[EMAIL PROTECTED] Sent: Friday, May 23, 2003 11:43 AM To: [EMAIL PROTECTED] Subject: Re: Big/Huge XMLs Savino, Matt C wrote: Below is the log output for a slightly larger (66 page) report... Reports are the root of all evil, oh well. Your small attachment expands to an impressive 4MB file, which contains a single table with roughly 30'000 cells. It ultimately run out of memory around page 4 on my JDK 1.3.1 -Xmx64M on WinNT. The FO tree for the file soaks up a good chunk of the allocated memory, according to Jochen Wiedmann's Dr.Mem memory profiler: bytes class 4440832 org.apache.fop.fo.flow.Block 4376280 org.apache.fop.fo.flow.TableCell 3964032 org.apache.fop.fo.PropertyList 2973024 org.apache.fop.fo.PropertyManager 2355840 org.apache.fop.fo.FOText 703200 org.apache.fop.fo.LengthProperty 468640 org.apache.fop.datatypes.FixedLength 163296 org.apache.fop.datatypes.KeepValue 438912 org.apache.fop.fo.flow.TableRow 81024 org.apache.fop.datatypes.Keep 81024 org.apache.fop.fo.KeepProperty 54432 org.apache.fop.fo.flow.TableRow$CellArray 20100536 bytes Another 18MB of java base objects like HashMap also contribute quite a bit. This means that memory is already pretty tight before the layout process even starts. I also the repeated font-size=8pt and text-align=start causes some bloat, and deleting them reduced the overall number of created objects by 10%. However, the effect on run time was neglible. Increasing the mx setting resulted in memory thrashing :-/. A closer look at the memory profiler statistics showed that all the layout data associated with table cells still hung around in memory at the time memory runs out. Digging further this turned out to be caused by table objects clinging to their layout data indefinitely. That's bad. I put in a small hack, using the area's back pointer to release the data for Table, AbstractTableBody, TableRow and TableCell after rendering. This allowed me to render the file, albeit slowly due to frequent GC. Unfortunately I'm reluctant to commit the change because it is likely to break some things, in particular putting IDs on a table will cause trouble in certain situations. This could probably be fixed too, but the more the scope of the code change is broadened the more testing would be necessary before the next release. Conclusion: don't use tables, as they lock up a lot of memory until the page sequence ends. If you have to use tables, use short page sequences
Re: Big/Huge XMLs
Savino, Matt C wrote: Wow. Thanks for the extremely throrough investigation. This has me wondering if I use region-before for by column headers, could I just break the table every 100 rows or so? It was my first thought to recommend exactly this, but it wont help. The various table FOs keep references to the generated areas, which is the cause of memory filling up, and they will stay in memory until the FOs are GC'd, which is after layout of the whole containing page sequence is complete. In my 1000 page books I have tables too, but they are much shorter (14pt font instead of 8pt, and NO CAPS :-), and they have much less cells, therefore I never encountered the problem. -Original Message- Uh, Outlook in action. J.Pietschmann - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Big/Huge XMLs
Savino, Matt C wrote: Below is the log output for a slightly larger (66 page) report... Reports are the root of all evil, oh well. Your small attachment expands to an impressive 4MB file, which contains a single table with roughly 30'000 cells. It ultimately run out of memory around page 4 on my JDK 1.3.1 -Xmx64M on WinNT. The FO tree for the file soaks up a good chunk of the allocated memory, according to Jochen Wiedmann's Dr.Mem memory profiler: bytes class 4440832 org.apache.fop.fo.flow.Block 4376280 org.apache.fop.fo.flow.TableCell 3964032 org.apache.fop.fo.PropertyList 2973024 org.apache.fop.fo.PropertyManager 2355840 org.apache.fop.fo.FOText 703200 org.apache.fop.fo.LengthProperty 468640 org.apache.fop.datatypes.FixedLength 163296 org.apache.fop.datatypes.KeepValue 438912 org.apache.fop.fo.flow.TableRow 81024 org.apache.fop.datatypes.Keep 81024 org.apache.fop.fo.KeepProperty 54432 org.apache.fop.fo.flow.TableRow$CellArray 20100536 bytes Another 18MB of java base objects like HashMap also contribute quite a bit. This means that memory is already pretty tight before the layout process even starts. I also the repeated font-size=8pt and text-align=start causes some bloat, and deleting them reduced the overall number of created objects by 10%. However, the effect on run time was neglible. Increasing the mx setting resulted in memory thrashing :-/. A closer look at the memory profiler statistics showed that all the layout data associated with table cells still hung around in memory at the time memory runs out. Digging further this turned out to be caused by table objects clinging to their layout data indefinitely. That's bad. I put in a small hack, using the area's back pointer to release the data for Table, AbstractTableBody, TableRow and TableCell after rendering. This allowed me to render the file, albeit slowly due to frequent GC. Unfortunately I'm reluctant to commit the change because it is likely to break some things, in particular putting IDs on a table will cause trouble in certain situations. This could probably be fixed too, but the more the scope of the code change is broadened the more testing would be necessary before the next release. Conclusion: don't use tables, as they lock up a lot of memory until the page sequence ends. If you have to use tables, use short page sequences. Some additional notes: the most objects created in total were of class String (3'738'853), which is no surprise, followed by java.lang.Object (571'663), ArrayList (389'319), HashMap$Entry(278'527) and HashMap (122'702), which *is* a bit of a surprise. I guess the j.l.Object counts the various arrays. Hashmap entries, lists and the objects tended to be fairly persistent, with more than 100'000 lists and objects as well as 135'000 hashmap entries still being referenced at the end of the run. The hashmaps itself are less likely to be kept, only 7'500 out of a total of 122'000 have been left. This means the persistently referenced hashmaps keep an average of 13 entries, while the overall average is 4.7 entries per hash map. A few maps with lots of entries, like the ones holding the mapping from element names to element factories may account for much of the difference. OTOH the low overall average indicates that many of the hashmaps stay empty, there ought to be quite some potential for optimization. I suspect many of the lists stay empty too. Does anybody have ideas or knowledge of tools which allow more detailed investigations of this kind of issues? J.Pietschmann - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Big/Huge XMLs
Wow. Thanks for the extremely throrough investigation. This has me wondering if I use region-before for by column headers, could I just break the table every 100 rows or so? So I started working on it, but now I'm stumped. For some reason the attacehd FO, which looks fine to me is crashing not only FOP, but Weblogic entirely, with no warning at all. Can anyone see why? Very very strange. -Matt -Original Message- From: J.Pietschmann [mailto:[EMAIL PROTECTED] Sent: Friday, May 23, 2003 11:43 AM To: [EMAIL PROTECTED] Subject: Re: Big/Huge XMLs Savino, Matt C wrote: Below is the log output for a slightly larger (66 page) report... Reports are the root of all evil, oh well. Your small attachment expands to an impressive 4MB file, which contains a single table with roughly 30'000 cells. It ultimately run out of memory around page 4 on my JDK 1.3.1 -Xmx64M on WinNT. The FO tree for the file soaks up a good chunk of the allocated memory, according to Jochen Wiedmann's Dr.Mem memory profiler: bytes class 4440832 org.apache.fop.fo.flow.Block 4376280 org.apache.fop.fo.flow.TableCell 3964032 org.apache.fop.fo.PropertyList 2973024 org.apache.fop.fo.PropertyManager 2355840 org.apache.fop.fo.FOText 703200 org.apache.fop.fo.LengthProperty 468640 org.apache.fop.datatypes.FixedLength 163296 org.apache.fop.datatypes.KeepValue 438912 org.apache.fop.fo.flow.TableRow 81024 org.apache.fop.datatypes.Keep 81024 org.apache.fop.fo.KeepProperty 54432 org.apache.fop.fo.flow.TableRow$CellArray 20100536 bytes Another 18MB of java base objects like HashMap also contribute quite a bit. This means that memory is already pretty tight before the layout process even starts. I also the repeated font-size=8pt and text-align=start causes some bloat, and deleting them reduced the overall number of created objects by 10%. However, the effect on run time was neglible. Increasing the mx setting resulted in memory thrashing :-/. A closer look at the memory profiler statistics showed that all the layout data associated with table cells still hung around in memory at the time memory runs out. Digging further this turned out to be caused by table objects clinging to their layout data indefinitely. That's bad. I put in a small hack, using the area's back pointer to release the data for Table, AbstractTableBody, TableRow and TableCell after rendering. This allowed me to render the file, albeit slowly due to frequent GC. Unfortunately I'm reluctant to commit the change because it is likely to break some things, in particular putting IDs on a table will cause trouble in certain situations. This could probably be fixed too, but the more the scope of the code change is broadened the more testing would be necessary before the next release. Conclusion: don't use tables, as they lock up a lot of memory until the page sequence ends. If you have to use tables, use short page sequences. Some additional notes: the most objects created in total were of class String (3'738'853), which is no surprise, followed by java.lang.Object (571'663), ArrayList (389'319), HashMap$Entry(278'527) and HashMap (122'702), which *is* a bit of a surprise. I guess the j.l.Object counts the various arrays. Hashmap entries, lists and the objects tended to be fairly persistent, with more than 100'000 lists and objects as well as 135'000 hashmap entries still being referenced at the end of the run. The hashmaps itself are less likely to be kept, only 7'500 out of a total of 122'000 have been left. This means the persistently referenced hashmaps keep an average of 13 entries, while the overall average is 4.7 entries per hash map. A few maps with lots of entries, like the ones holding the mapping from element names to element factories may account for much of the difference. OTOH the low overall average indicates that many of the hashmaps stay empty, there ought to be quite some potential for optimization. I suspect many of the lists stay empty too. Does anybody have ideas or knowledge of tools which allow more detailed investigations of this kind of issues? J.Pietschmann - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] attachment: ReportOutput.zip - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Big/Huge XMLs
[I'll give J. a breather on this one] Assuming you've made certain that the bottleneck is in FOP and not your XSLT transformation, the only thing you can really do to help with very large reports is to break the PDFs into multiple page-sequences. IE start a new page-sequence every X nodes. If you have natural page break that you can use, no one will ever notice in the final output. Otherwise you may have awkward page breaks in your document. I have attached an XSLT stylesheet that we use to essentially break theXSL:FO into "chunks". Every 10 Investigator (=doctor) nodes we start a new page-sequence. Since this report has a page breakfor each new investigator anyway, the end result is no different. We increased our max PDF size on this report from 30 pages to 200 using this method, and seriously sped up rendering time for large reports. FYI - weare runningWeblogic on HP-UX with -hotspot, max-heap-size=512MB. For some reason, the Wintel JVM seems to perform a lot better than HP-UX on FOP, and in general. Note: as originally pointed out on this board, there is actually a slicker, and probably more efficient, way to do the node processing using a recurring template. ButI just found out how to do that and our bottleneck isn't in the XSLT so I haven't had any motiviation to go back and fix it. Here's the example: http://www.dpawson.co.uk/xsl/sect2/N4486.html#d4085e94I like Steve Tinney's solution to the grouping problem. Hope this helps, Matt -Original Message-From: Mohit Sharma [mailto:[EMAIL PROTECTED]Sent: Wednesday, May 21, 2003 6:34 PMTo: [EMAIL PROTECTED]Subject: Big/Huge XMLs I havebig/huge XMLs, and I need to convertthem intoPDFs using FOP.Benchmarking the latest FOP gives poor results, both memory-wise andprocessing-wise. Its just taking too much time. The XML cannot really be broken down into chunks, as its all part of a report. And I need to process a lot of reports overnight, and I don't have a cluster at my disposal to distribute the load. Is there a way to speed up the processing time ? Best, Mohit Sharma pvh_XmlToFo.xsl Description: pvh_XmlToFo.xsl - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Big/Huge XMLs
In addition to Matt's comments there is: http://xml.apache.org/fop/running.html#memory Also keep in mind that very high memory consumption often comes with a decrease in speed for Java. So reducing memory consumption may improve speed. Also avoid building a DOM for input. I you do that try changing to SAX event generation. On 22.05.2003 03:34:06 Mohit Sharma wrote: I have big/huge XMLs, and I need to convert them into PDFs using FOP. Benchmarking the latest FOP gives poor results, both memory-wise and processing-wise. Its just taking too much time. The XML cannot really be broken down into chunks, as its all part of a report. And I need to process a lot of reports overnight, and I don't have a cluster at my disposal to distribute the load. Is there a way to speed up the processing time ? Jeremias Maerki - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Big/Huge XMLs
Savino, Matt C wrote: We increased our max PDF size on this report from 30 pages to 200 Huh? What complications do you add to the layout to run out of memory at only *30* pages? I never had any problems until I got well past 1000 pages (using -mx128M, JDK 1.3.1) J.Pietschmann - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Big/Huge XMLs
30 pages is not our absolute max, but we have set a requirement that we have to be able to handle at least two concurrent reports, so we need some cushion. I have attached some FO for a report that runs around 50 pages, using only one page-sequence. I would really appreciate it if you coule run it through your FOP processor and tell me what kind of performance you see. It runs on our system, but two of these at once will come very close to an out of memory error. I have a feeling the large performance discrepancy may be because our report is one large table. Please let me know what you find. Below is the log output for a slightly larger (66 page) report also using only one page-sequence. I had to do some de-indefication to create the report I attached. But this log output is from the exact same report, just slightly different data. As you can see at -mx256m, we come close to an out of memory error on just this report. Once we implemented the page-sequence chunking though, full garbage collection knocks memory-in-use down to the range of 25-75MB depending on the report. This is a big improvement over the 215MB that you still see in use below even after a full GC. This was run on my Windows dev box (PIII 850Mhz 512MB ram) set to -mx256m. Memory performance and speed are always much worse on our production boxes( HP-UX, 2x550 RISC, 2GB ram) set to -mx512M -- especially on 2 or more concurrent reports. We're trying to talk our infrastructure police into letting us set up a Win2k server with Weblogic as a dedicated PDF report generator. No luck yet. [GC 41503K-39768K(216920K), 0.0604714 secs] [GC 41816K-40081K(216920K), 0.0734613 secs] (... FOP processing begins below ...) building formatting object tree setting up fonts [GC 42128K-40544K(216920K), 0.0598146 secs] [GC 42592K-41074K(216920K), 0.0308461 secs] (... many more partial garbage collections ...) [GC 113914K-112394K(216920K), 0.0217497 secs] [GC 114442K-112925K(216920K), 0.077 secs] [GC 114973K-113455K(216920K), 0.0222500 secs] [GC 115503K-113983K(216920K), 0.0219170 secs] (... FOP is done pre-processing, page generation begins ...) [1[GC 116031K-114464K(216920K), 0.0214575 secs] [GC 116512K-114883K(216920K), 0.0216715 secs] [GC 116931K-115305K(216920K), 0.0224520 secs] [GC 117353K-115722K(216920K), 0.0233072 secs] ] [2[GC 117770K-116113K(216920K), 0.0208135 secs] [GC 118161K-116531K(216920K), 0.0223947 secs] [GC 118579K-116951K(216920K), 0.0220651 secs] ][GC 118999K-117329K(216920K), 0.0211273 secs] [3[GC 119377K-117768K(216920K), 0.0220210 secs] [GC 119816K-118186K(216920K), 0.0219897 secs] [GC 120234K-118606K(216920K), 0.0223495 secs] (... pages 4-62 output and steady partial garbage collection ...) [63[GC 212110K-210967K(216920K), 0.0201827 secs] [GC 213011K-211792K(216920K), 0.0224984 secs] [GC 213370K-212929K(216920K), 0.0238419 secs] [GC 214977K-214223K(216920K), 0.0349902 secs] [GC 216271K-215075K(217464K), 0.0221023 secs] (... full garbage collection, at this point we know most memory in use is for FOP ...) (... IE - normal full garbage collection brings memory in use down to ~20MB ...) (... So at -mx256m, FOP is ~40MB away from an out of memory error ...) [Full GC 217123K-215866K(261888K), 5.5263608 secs] [GC 217207K-216127K(261888K), 1.4741779 secs] ] [64[GC 218175K-216477K(261888K), 0.0265551 secs] [GC 218525K-216896K(261888K), 0.0246568 secs] [GC 218944K-217318K(261888K), 0.0252940 secs] ][GC 219366K-217700K(261888K), 0.0236608 secs] [65[GC 219748K-218130K(261888K), 0.0244581 secs] [GC 220178K-218552K(261888K), 0.0248638 secs] [GC 220600K-218971K(261888K), 0.0303567 secs] ] [66[GC 221019K-219361K(261888K), 0.0230088 secs] ] Parsing of document complete, stopping renderer Initial heap size: 40908Kb Current heap size: 219991Kb Total memory used: 179083Kb Memory use is indicative; no GC was performed These figures should not be used comparatively Total time used: 66075ms Pages rendererd: 66 Avg render time: 1001ms/page -Original Message- From: J.Pietschmann [mailto:[EMAIL PROTECTED] Sent: Thursday, May 22, 2003 12:35 PM To: [EMAIL PROTECTED] Subject: Re: Big/Huge XMLs Savino, Matt C wrote: We increased our max PDF size on this report from 30 pages to 200 Huh? What complications do you add to the layout to run out of memory at only *30* pages? I never had any problems until I got well past 1000 pages (using -mx128M, JDK 1.3.1) J.Pietschmann - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] attachment: NAReportOutput.zip - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]