Re: How FOP renderred my 17,344 page document...
AAh, I have a working multi -threaded memory patch that i developed and its been lying around with me simply because i just dont want to figure how to commit from my stupid windoze machine. If anyone wants to see the work i have done, committers included, pls take it from me and commit it! sesha - Original Message - From: Mark [EMAIL PROTECTED] To: [EMAIL PROTECTED] Cc: [EMAIL PROTECTED] Sent: Monday, July 16, 2001 8:56 AM Subject: Re: How FOP renderred my 17,344 page document... Hi Fopsters I'm personally only offended by being referred to as a fopsicle. Just kidding. :-) I just love the word 'fop', it has so many literary possibilities. :-) Seriously, this is great stuff. You should be aware that the initial memory buffering that you see with the -buf switch is only a limited portion of more extensive work that has yet to be folded into FOP. [snip] OK. I believe it will be necessary for the two systems to be aware of one another. I actually had to back out the changes to FOText that related to BufferManager because BufferManager was holding onto the text well past it's use-by millisecond. As I understand things, the BufferManager stuff will help with long, single page-sequences, and my stuff will help with multiple page sequences. This should satisfy everyone! (famous last words if ever i've written them!) This is open-source; nobody should be offended by people hacking away at code. should be being the operative word. Some open source project participants don't like getting critiqued by outsiders - and a patch is basically a critique written in Java. So I'm very pleased and excited with your reaction to these ideas. Specific comments; IDReferences have mostly to do with fo:page-number-citation/, that is, the possibility is there that you need the page number that contains the results of rendering the block with id=foo356, and you're currently on page 44, and the block with id=foo356 will end up on page 887, although you don't know that yet. You're right, this kind of stuff can cause major issues for pipelining. Nothing insurmountable, though. OK. For my own personal 'itch' I don't care about page-number-citations, but obviously it must be supported. So I would propose that I work out how to deal with the citations in a PageSequence queue and use deferral for the time being. This would mean that documents that contain references to future page-sequences will consume more memory, but should otherwise work just the same. The code you see in Root.java should not require page-sequence N+1 to be formatted before you render page-sequence N. All that's going on there is, if the force-page-count property on page-sequence N is auto, it needs to know about the initial-page-number property on page-sequence N+1. This doesn't require any formatting to take place at all. OK, well with the deferral mechanism the page-sequence will be parsed but not formatted/rendered immediately. I'll look at the code path for the force-page-count property and see how i can optimise it under this queue scheme. So this is how I plan to proceed: o Try to get the PDF renderer serializing much sooner. This appears to be a current bottleneck in complex documents, but I haven't run the profiler against them yet since I only started processing large numbers of examples last night. o Look at the IDReferences again and see if I can design a neat implementation that addresses the queuing issues. So far all my work has been an unstructured hack (there are some nasty public statics in there to hold things together - um ahh), so - if I can show that the approach works, I intend to reimplement it from scratch against the latest CVS. I think what I'll do is, once I make everything work using my hacks, I'll post a summary of the changes I intend to make, so that people can give me feedback on the ideas. Also, I have run some of the tests from the examples directory and they appear to look fine, except that I don't know what they're *supposed* to look like!! So how would I go about getting my code validated against XML:FO? (I think there was a thread about this a little while ago but sadly I didn't read it ): Regards Mark - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, email: [EMAIL PROTECTED]
How FOP renderred my 17,344 page document...
Hey there fopsicles Well I just generated a 17,344 page document using FOP on Linux JDK1.3.1 with a 64Mb heap into PDF. I also tried processing a 34,688 page document but ran out of memory on page 33,287 (bummer!). I suspect that the OOM error is almost certainly due to the PDFRenderer keeping it's output in RAM, but that's OK by me for the moment. It looks easy enough to write the PDF on the fly though. There were surprisingly few changes I made in order to get FOP to pipeline from when the /fo:page-sequence is received. Of course, the code is a total hack, but I was just trying to see if I understood things properly and trying to prove the concept. I intend to try this out on a more complex document tomorrow. There does appear to be a requirement in Root.java where it looks to the successive page-sequence for some data to do with page numbering. The simple solution to this (IMHO) is to defer rendering of a page sequence until it's successor is also formatted. I believe this would be simple to implement. With regards to the IDReferences: I still don't know exactly what they are for, because I haven't even tried to look in the right places, but if I'm right, a given page-sequence might refer to objects in other page-sequences using an xml ID, or something. So I figure that the way to deal with this is to keep all unresolved references in a list in the PageSequence object, and defer rendering that page-sequence (and any subsequent page-sequences) until the reference list is resolved. Once again I think this is a straightforward change. It is not a perfect solution because e.g. a table of contents presumably uses this IDReferences table and that's normally going to be at the start of a document, so under this scheme we're back to square one. An alternative solution would be to force drivers to be able to write pages out-of-sequence, so for example only the contents page would be deferred until it's references are resolved. This gets the memory-consuming stuff out of FOP but means the drivers are harder to write (OTOH since most pages are just streams it would be trivial to write a helper class to deal with out-of-order pages and reassembly). That to me is a large change and I am not suggesting this course of action at this particular time, but maybe it's something to think about. Anyway I intend to follow up on this work tomorrow, I would like to look at the ID references thing and stop talking out my bottom about it, and I would like to look at rendering much more complex documents to see if I've made too many assumptions. Also I have only modified the PDF driver, I haven't even looked at the other ones yet. The changes to the PDF driver are very minor though. I hope noone is offended by my work/writing on this stuff, I realise that FOP is experimental but the number of changes are surprisingly small and the results are just so cool. Memory use is significantly reduced for all cases where there is more than one page-sequence, and total time to render seems to be significantly reduced. If anyone is interested in a summary of the changes I made then drop me a line. Regards, Mark
Re: How FOP renderred my 17,344 page document...
At 05:41 PM 7/15/01 +1000, Mark wrote: I hope noone is offended by my work/writing on this stuff, I realise that FOP is experimental but the number of changes are surprisingly small and the results are just so cool. Memory use is significantly reduced for all cases where there is more than one page-sequence, and total time to render seems to be significantly reduced. If anyone is interested in a summary of the changes I made then drop me a line. I'm personally only offended by being referred to as a fopsicle. Just kidding. :-) Seriously, this is great stuff. You should be aware that the initial memory buffering that you see with the -buf switch is only a limited portion of more extensive work that has yet to be folded into FOP. This latter is an extensive patch, and I have not had the time to commit it (well, there have been a few other glitches). The developer that has been doing that is now a committer, and I hope that the rest of the memory buffering code will soon appear in CVS. Nevertheless, it seems to me like you are doing complementary things, a diferent approach, and perhaps we will be able to finally see elements of both approaches working together. This is open-source; nobody should be offended by people hacking away at code. Specific comments; IDReferences have mostly to do with fo:page-number-citation/, that is, the possibility is there that you need the page number that contains the results of rendering the block with id=foo356, and you're currently on page 44, and the block with id=foo356 will end up on page 887, although you don't know that yet. You're right, this kind of stuff can cause major issues for pipelining. Nothing insurmountable, though. The code you see in Root.java should not require page-sequence N+1 to be formatted before you render page-sequence N. All that's going on there is, if the force-page-count property on page-sequence N is auto, it needs to know about the initial-page-number property on page-sequence N+1. This doesn't require any formatting to take place at all. Regards, Arved Sandstrom Fairly Senior Software Type e-plicity (http://www.e-plicity.com) Wireless * B2B * J2EE * XML --- Halifax, Nova Scotia - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, email: [EMAIL PROTECTED]