Re: How FOP renderred my 17,344 page document...

2001-07-18 Thread Seshadri G.K.

AAh, I have a working multi -threaded memory patch that i developed and its
been lying around with me simply because i just dont want to figure how to
commit from my stupid windoze machine. If anyone wants to see the work i
have done, committers included, pls take it from me and commit it!

sesha

- Original Message -
From: Mark [EMAIL PROTECTED]
To: [EMAIL PROTECTED]
Cc: [EMAIL PROTECTED]
Sent: Monday, July 16, 2001 8:56 AM
Subject: Re: How FOP renderred my 17,344 page document...


 Hi Fopsters

  I'm personally only offended by being referred to as a fopsicle. Just
  kidding. :-)

 I just love the word 'fop', it has so many literary possibilities. :-)

  Seriously, this is great stuff. You should be aware that the initial
memory
  buffering that you see with the -buf switch is only a limited portion of
  more extensive work that has yet to be folded into FOP.

 [snip]

 OK. I believe it will be necessary for the two systems to be aware of
 one another. I actually had to back out the changes to FOText that
 related to BufferManager because BufferManager was holding onto the text
 well past it's use-by millisecond. As I understand things, the
 BufferManager stuff will help with long, single page-sequences, and my
 stuff will help with multiple page sequences. This should satisfy
 everyone! (famous last words if ever i've written them!)

  This is open-source; nobody should be offended by people hacking away at
code.

 should be being the operative word. Some open source project
 participants don't like getting critiqued by outsiders - and a patch is
 basically a critique written in Java. So I'm very pleased and excited
 with your reaction to these ideas.

  Specific comments; IDReferences have mostly to do with
  fo:page-number-citation/, that is, the possibility is there that you
need
  the page number that contains the results of rendering the block with
  id=foo356, and you're currently on page 44, and the block with
id=foo356
  will end up on page 887, although you don't know that yet. You're right,
  this kind of stuff can cause major issues for pipelining. Nothing
  insurmountable, though.

 OK. For my own personal 'itch' I don't care about page-number-citations,
 but obviously it must be supported. So I would propose that I work out
 how to deal with the citations in a PageSequence queue and use deferral
 for the time being. This would mean that documents that contain
 references to future page-sequences will consume more memory, but should
 otherwise work just the same.

  The code you see in Root.java should not require page-sequence N+1 to be
  formatted before you render page-sequence N. All that's going on there
is,
  if the force-page-count property on page-sequence N is auto, it
needs to
  know about the initial-page-number property on page-sequence N+1. This
  doesn't require any formatting to take place at all.

 OK, well with the deferral mechanism the page-sequence will be parsed
 but not formatted/rendered immediately. I'll look at the code path for
 the force-page-count property and see how i can optimise it under this
 queue scheme.

 So this is how I plan to proceed:

 o Try to get the PDF renderer serializing much sooner. This appears to
 be a current bottleneck in complex documents, but I haven't run the
 profiler against them yet since I only started processing large numbers
 of examples last night.

 o Look at the IDReferences again and see if I can design a neat
 implementation that addresses the queuing issues.

 So far all my work has been an unstructured hack (there are some nasty
 public statics in there to hold things together - um ahh), so - if I can
 show that the approach works, I intend to reimplement it from scratch
 against the latest CVS. I think what I'll do is, once I make everything
 work using my hacks, I'll post a summary of the changes I intend to
 make, so that people can give me feedback on the ideas.

 Also, I have run some of the tests from the examples directory and they
 appear to look fine, except that I don't know what they're *supposed* to
 look like!! So how would I go about getting my code validated against
 XML:FO? (I think there was a thread about this a little while ago but
 sadly I didn't read it ):

 Regards
 Mark



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, email: [EMAIL PROTECTED]




How FOP renderred my 17,344 page document...

2001-07-15 Thread Mark



Hey there fopsicles

Well I just generated a 17,344 page document using FOP on Linux JDK1.3.1 with a 64Mb heap into PDF. I also tried processing a 34,688 page document but ran out of memory on page 33,287 (bummer!). I suspect that the OOM error is almost certainly due to the PDFRenderer keeping it's output in RAM, but that's OK by me for the moment. It looks easy enough to write the PDF on the fly though.

There were surprisingly few changes I made in order to get FOP to pipeline from when the /fo:page-sequence is received. Of course, the code is a total hack, but I was just trying to see if I understood things properly and trying to prove the concept.

I intend to try this out on a more complex document tomorrow.

There does appear to be a requirement in Root.java where it looks to the successive page-sequence for some data to do with page numbering. The simple solution to this (IMHO) is to defer rendering of a page sequence until it's successor is also formatted. I believe this would be simple to implement.

With regards to the IDReferences: I still don't know exactly what they are for, because I haven't even tried to look in the right places, but if I'm right, a given page-sequence might refer to objects in other page-sequences using an xml ID, or something. So I figure that the way to deal with this is to keep all unresolved references in a list in the PageSequence object, and defer rendering that page-sequence (and any subsequent page-sequences) until the reference list is resolved. Once again I think this is a straightforward change. It is not a perfect solution because e.g. a table of contents presumably uses this IDReferences table and that's normally going to be at the start of a document, so under this scheme we're back to square one. An alternative solution would be to force drivers to be able to write pages out-of-sequence, so for example only the contents page would be deferred until it's references are resolved. This gets the memory-consuming stuff out of FOP but means the drivers are harder to write (OTOH since most pages are just streams it would be trivial to write a helper class to deal with out-of-order pages and reassembly). That to me is a large change and I am not suggesting this course of action at this particular time, but maybe it's something to think about.

Anyway I intend to follow up on this work tomorrow, I would like to look at the ID references thing and stop talking out my bottom about it, and I would like to look at rendering much more complex documents to see if I've made too many assumptions.

Also I have only modified the PDF driver, I haven't even looked at the other ones yet. The changes to the PDF driver are very minor though.

I hope noone is offended by my work/writing on this stuff, I realise that FOP is experimental but the number of changes are surprisingly small and the results are just so cool. Memory use is significantly reduced for all cases where there is more than one page-sequence, and total time to render seems to be significantly reduced. If anyone is interested in a summary of the changes I made then drop me a line.

Regards,
Mark



Re: How FOP renderred my 17,344 page document...

2001-07-15 Thread Arved Sandstrom

At 05:41 PM 7/15/01 +1000, Mark wrote:
 I hope noone is offended by my work/writing on this stuff, I realise that
FOP is experimental but the number of changes are surprisingly small and
the results are just so cool. Memory use is significantly reduced for all
cases where there is more than one page-sequence, and total time to render
seems to be significantly reduced. If anyone is interested in a summary of
the changes I made then drop me a line.

I'm personally only offended by being referred to as a fopsicle. Just 
kidding. :-)

Seriously, this is great stuff. You should be aware that the initial memory 
buffering that you see with the -buf switch is only a limited portion of 
more extensive work that has yet to be folded into FOP. This latter is an 
extensive patch, and I have not had the time to commit it (well, there have 
been a few other glitches). The developer that has been doing that is now a 
committer, and I hope that the rest of the memory buffering code will soon 
appear in CVS. Nevertheless, it seems to me like you are doing complementary 
things, a diferent approach, and perhaps we will be able to finally see 
elements of both approaches working together.

This is open-source; nobody should be offended by people hacking away at code.

Specific comments; IDReferences have mostly to do with 
fo:page-number-citation/, that is, the possibility is there that you need 
the page number that contains the results of rendering the block with 
id=foo356, and you're currently on page 44, and the block with id=foo356 
will end up on page 887, although you don't know that yet. You're right, 
this kind of stuff can cause major issues for pipelining. Nothing 
insurmountable, though.

The code you see in Root.java should not require page-sequence N+1 to be 
formatted before you render page-sequence N. All that's going on there is, 
if the force-page-count property on page-sequence N is auto, it needs to 
know about the initial-page-number property on page-sequence N+1. This 
doesn't require any formatting to take place at all.

Regards,
Arved Sandstrom

Fairly Senior Software Type
e-plicity (http://www.e-plicity.com)
Wireless * B2B * J2EE * XML --- Halifax, Nova Scotia


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, email: [EMAIL PROTECTED]