Hi Mark,

I am also looking for this memory problem solution. I like your
analysis. Please do continue to work on it and apply the patches and
keep me posted.

Cheers,

Chetan Vig



Mark wrote:

> Howdy fopsters
>
> Yesterday I spent about 10 hours with FOP, vi, and a memory profiler
> (JProbe 3 profiler - evaluation version, amazing product, but rather
> expensive :-{ ). I have some observations of the FOP code that I
> thought would be useful to share.
>
> Firstly I understand that FOP is being redesigned. I followed the
> thread there a bit and one of the main concerns, as I recall, was that
> FOP is hard for newbies (like me) to understand. I would say that this
> is not true. I actually found FOP reasonably easy to get around.
> Considering that yesterday is the first time I've managed to really
> get into the guts of it, and that I probably am familiar with only
> half a percent of the code base, I would say that generally FOP is
> pretty well written and reasonably straightforward. Perhaps there are
> areas of it that need refactoring and documenting (and formatting to 8
> character tabs ;) but from what I saw of it ... I figured that any
> redesign would be equally impenetrable, as all large code bases are.
> Anyway, that's just my .02c from the 'newbie FOP programmer'
> perspective.
>
> My goals for the profiling were to enable FOP to process large
> documents with a standard environment (64Mb heap on JDK1.3.1 for
> Linux). I have a 4,500 page document (containing a very simple
> structure) and standard FOP dies at 600 pages. (By the end of the day
> it was up to over 3,000 pages). I discovered the "-buf" argument and
> tried that but that only extended the run to 900 pages and took about
> 3x the amount of time. My eventual goal is for fast processing of an
> unlimited number of pages, but I'll personally be happy at 10,000
> slightly more complex pages.
>
> Basically what I discovered with the profiler was that there are a
> large amount of objects being hung onto for a long time. (obviously
> ;). In particular, the Block object holds a reference to a BlockArea
> object in a member field (blockArea), but the blockArea member is
> referenced only twice in other methods for simple field calls. Making
> the BlockArea a local variable for the layout(Area) method
> significantly reduces memory consumption. (I'll post patches, if
> desired, once the thread is complete). In otherwise unmodified code,
> this change increased the number of pages I could process from 600 to
> over 1,000.
>
> Another object that's hanging around a lot is the Page object. This
> one was more tricky and it took me ages to work out how to deal with
> it, but finally I discovered that it appears that the processing in
> FOP can be pipelined. So I hacked FOP to pipeline the format->render
> cycle, so as each page is formatted it is sent to the (PDF) renderer
> immediately. This hack allowed me to increase the page count to over
> 3,000 pages, but the problem is that the hack made the IDReferences
> always empty. More on that shortly.
>
> It seems to me from inspection and my limited knowledge of what I'm
> doing, that it should be possible to completely pipeline FOP at the
> level of the fo:page-sequence, without major changes to FOP. My
> experience so far with pipelining the format-render steps was very
> positive (it worked first time!) and I am interested in looking at
> formatting immediately after receiving the </fo:page-sequence>. I also
> noted in my travels through the code that PDFRenderer seems to hang
> onto a lot of stuff that it could probably write out immediately, if
> it had a stream to write to. So, all this raises a few questions:
>
> * How to deal with the IDReferences? (let the renderer deal with it?
> do two passes?). I don't fully understand the full purpose and
> implementation of IDReferences at the moment but at a guess it's used
> to resolve forward references in the FO file...? I'm sure that it's
> possible to at least optimize this.
>
> * Is there something in XML:FO that means I can't process on receipt
> of </fo:page-sequence>?
>
> * Am I just being stupid?
>
> The benefits to this approach as I see it are:
>
> * Probably not too many changes to FOP internals
>
> * Will use significantly less memory even for small jobs
>
> * Will increase the number of applications for FOP
>
> I'm interested in continuing this line of experimental work if
> anyone's interested. As always kudos to the developers, I spent a very
> pleasant saturday with your code.
>
> Regards
> Mark Lillywhite


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, email: [EMAIL PROTECTED]

Reply via email to