Re: diffs for on-the-fly image support
Yes the several patches is good, thanks. This way the appropriate ones can be applied to both code bases. I agree that 3 is probably better and should be done for the development code. 1 is suitable for a quick solution for the maintenance branch. As for the extension, this is really for the development code. I don't know exactly where you are getting your data etc. from but the new code could handle this as an extension. The svg drawing itself is an extension and it could be done in the same way. You supply a handler on the user agent, this handler receives some xml data and has access to the pdf document, streams etc. This could make it easier but I would need more info. On Tue, 2002-05-21 at 16:00, Paul Reavis wrote: Agreed. Here are some possible solutions: 1) a boolean switch (in the api or system properties) 2) intelligence in the buffer itself, where it uses a tempfile after a certain size is reached 3) better overall architecture where buffers are immediately flushed to output rather than remaining in memory (3) seems best and is in line with the next-gen design documents I see on the fop site, but I don't know how far along y'all are with that. I have to use a similar architecture for my map translation software; GIS systems are hundreds of megabytes and scalability requires a flat memory usage model. All my buffers are strictly memory-limited. (1) is easy enough (2) would be fine but probably has pitfalls; the problem is that there are a _lot_ of these buffers and PDFStreams running around, and therefore it's a global problem - I counted dozens for one plot, 24MB total. I was planning on using a switch for the cvs patch, unless y'all have (3) figured out. I don't see the need for an extra PDFStreamGraphics2D class. Modifying the PDFGraphics2D should suffice. Agreed. I just didn't want to break the existing (the current patch uses PDFStreamGraphics2D just for my case). An extension may work better in this situation with the development code. If I understand the problem properly. ?? An extension to the code, or a file extension for the URL? I'm not sure what you mean. As far as my plans for the other features: I figure the drawImage hack is a no-brainer. It's just the right thing to do in that instance. The additional memory usage should be no big deal (it's a hash of image pointer to integer ID). I'll just modify PDFGraphics2D directly to use the underlying PDFStream. I think this is fine for all cases. Should I break it up into several patches? - tempfile buffering - drawImage hack - PDFGraphics2D hack - on-the-fly images - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, email: [EMAIL PROTECTED]
Re: diffs for on-the-fly image support
Keiron Liddle ([EMAIL PROTECTED]) wrote To FOP on Wed, May 22, 2002 at 10:30:45AM +0200: Yes the several patches is good, thanks. This way the appropriate ones can be applied to both code bases. I agree that 3 is probably better and should be done for the development code. 1 is suitable for a quick solution for the maintenance branch. As for the extension, this is really for the development code. I don't know exactly where you are getting your data etc. from but the new code could handle this as an extension. The svg drawing itself is an extension and it could be done in the same way. You supply a handler on the user agent, this handler receives some xml data and has access to the pdf document, streams etc. This could make it easier but I would need more info. In brief, the algo is this: 1) before pdf generation, the client program sets up the on-the-fly snapshot objects - each is a subclass of OnTheFlyFopImage, supplying a paint(Graphics2D) routine. 2) the client then registers the images somewhere in the FOP api (in my current hack, with FopImageFactory directly) with a url like onthefly:uniquename 3) the client then runs the PDF generation 4) the PDFRenderer, when it encounters an external image reference with an onthefly:uniquename URL, looks up the correspondingly-named OnTheFlyFopImage in the registry 5) the PDFRenderer then sets up a PDFGraphics2D and runs OnTheFlyFopImage.paint on it. 6) at some point before or after pdf generation, the application can clear the registry, freeing up any memory used by the OnTheFlyFopImages. If you can describe in general what the algo would be for an extension I'll be glad to try and implement it. Incidentally, am I getting the development or maintenance branch when I just do a `cvs checkout`? Here are the actual examples from my current (outside of FOP) code. Incidentally, I really think there needs to be a library class with static methods like my convert() that allow a simple default embedding for folks - that's a lot of code to have to write just to run fop. ... snip public void createOnTheFly(MapViewPanel sourcePanel, File reportDir) { SystemLog.singleton().enter(Creating on-the-fly snapshots...); try { FopImageFactory.clearCache(); FopImageFactory.clearOnTheFlyImages(); Iterator e = getSnapshots().iterator(); int i = 0; while (e.hasNext()) { RenderMold currentSnapshot = (RenderMold)e.next(); currentSnapshot.setMonochromeBackground(monochromeBackground); currentSnapshot.setInvertBackgroundColor(!noColorFiltering); currentSnapshot.setPrinting(true); SystemLog.singleton().enter(Rendering snapshot + currentSnapshot + to image); this.setDrawFinerThanScale(currentSnapshot.getScale()); FopImageFactory.addOnTheFlyImage(Snapshot + i, new OnTheFlySnapshot(sourcePanel, currentSnapshot)); i++; } // wrap up this.setDrawFinerThanScale(null); } catch ( Exception oopsie ) { System.out.println(problem creating image in Snapshot source); Death.instant(oopsie); } } ... snip private class OnTheFlySnapshot extends OnTheFlyFopImage { private MapViewPanel sourcePanel; private RenderMold mold; public OnTheFlySnapshot(MapViewPanel sourcePanel, RenderMold mold) throws FopImageException { super(onthefly:Snapshot, viewFinder.getWidth(), viewFinder.getHeight()); this.sourcePanel = sourcePanel; this.mold = mold; } public void paint(Graphics2D graphics) { if (isNoColorFiltering()) GUILib.setRenderingHintsForPrinting(graphics); else GUILib.setRenderingHintsForInvertedPrinting(graphics); /* SystemLog.singleton().enter(Setting on-the-fly clip to: + viewFinder.getWidth() + , + viewFinder.getHeight()); graphics.setClip(0, 0, viewFinder.getWidth(), viewFinder.getHeight()); */ graphics.setFont(sourcePanel.getFont()); if (noColorFiltering) { graphics.setColor(Color.black); graphics.fillRect(0, 0, viewFinder.getWidth(), viewFinder.getHeight()); } else { graphics.setColor(Color.white); graphics.fillRect(0, 0, viewFinder.getWidth(), viewFinder.getHeight()); } // iterate through layers and renderers to paint Iterator it = sourcePanel.layers(); while (it.hasNext()) { MapViewLayer currentLayer = (MapViewLayer)it.next(); SystemLog.singleton().enter(Rendering + currentLayer + to PDF);
Re: diffs for on-the-fly image support
A normal cvs checkout gives you the development, which is different from current maintenance releases. What you are describing can definitely be done with an extension (in the devel code only, so this is for later). in your fo: instream-foreign-object width=.. height=.. myImage xmlns=my-space id=unique-id/ /instream-foreign-object This small bit of xml will then be passed to your extension available on the user agent. This extension gets the image and sets up the PDFGraphics2D and does its thing. It should be easier. This way the extra code is contained in a simple extension. The difference is that you need to use instream-foreign-object instead of image. This class is the default pdf extension that handles svg: http://cvs.apache.org/viewcvs.cgi/xml-fop/src/org/apache/fop/render/pdf/PDFXMLHandler.java?rev=1.4content-type=text/vnd.viewcvs-markup On Wed, 2002-05-22 at 14:42, Paul Reavis wrote: In brief, the algo is this: 1) before pdf generation, the client program sets up the on-the-fly snapshot objects - each is a subclass of OnTheFlyFopImage, supplying a paint(Graphics2D) routine. 2) the client then registers the images somewhere in the FOP api (in my current hack, with FopImageFactory directly) with a url like onthefly:uniquename 3) the client then runs the PDF generation 4) the PDFRenderer, when it encounters an external image reference with an onthefly:uniquename URL, looks up the correspondingly-named OnTheFlyFopImage in the registry 5) the PDFRenderer then sets up a PDFGraphics2D and runs OnTheFlyFopImage.paint on it. 6) at some point before or after pdf generation, the application can clear the registry, freeing up any memory used by the OnTheFlyFopImages. If you can describe in general what the algo would be for an extension I'll be glad to try and implement it. Incidentally, am I getting the development or maintenance branch when I just do a `cvs checkout`? Here are the actual examples from my current (outside of FOP) code. Incidentally, I really think there needs to be a library class with static methods like my convert() that allow a simple default embedding for folks - that's a lot of code to have to write just to run fop. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, email: [EMAIL PROTECTED]
Re: diffs for on-the-fly image support
Hi Paul, I don't think we can apply this patch directly for a number of reasons. Although there are parts in it with value that should be put into cvs when you have finished. The patch should be done against cvs rather than what you did which seems to be in reverse anyway (I suppose this is what you are working on). It's better to avoid the various formatting changes which really confuses things. Using temp files can cause problems in certain situations. I don't see the need for an extra PDFStreamGraphics2D class. Modifying the PDFGraphics2D should suffice. An extension may work better in this situation with the development code. If I understand the problem properly. Thanks, Keiron. On Mon, 2002-05-20 at 17:43, Paul Reavis wrote: Attached are gzipped diffs for the changes I made vs. the 0.20.3 release. I'm working on patches against CVS, but am pretty busy and wanted to get something out soonest. Essentially the patch includes: - support for callback-based, on-the-fly images (URLs like onthefly:SomeImage, you have to preregister the callback named SomeImage before running the FOP transformation) - a modified PDFGraphics2D called PDFStreamGraphics2D that does not use an intermediate byte buffer, but renders direct to a PDFStream - modified PDFStream so that it caches to tempfiles on disk rather than to heap - modified the drawImage portion of PDFStreamGraphics2D so that it only creates a new xObject for the image if it has never seen that image before, otherwise it reuses the reference The combination of these things took us from render times of up to 10 minutes and hundreds of megabytes of heap to render times of less than 10 seconds and less than 64MB of heap (the default max heap size). - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, email: [EMAIL PROTECTED]
Re: diffs for on-the-fly image support
Keiron Liddle ([EMAIL PROTECTED]) wrote To FOP on Tue, May 21, 2002 at 11:47:19AM +0200: I don't think we can apply this patch directly for a number of reasons. Although there are parts in it with value that should be put into cvs when you have finished. I figured as much. Mainly I wanted to get an example out for anyone to look at; the code I wrote is hardly high quality and I would rather do a more careful modification against CVS. Sorry about the formatting changes - emacs redid the indentation according to my weird standard and I was just too lazy to fix it back. Is there an apache or fop standard style or style canonicalizer? I know some projects use a tool to fix style to a standard. Using temp files can cause problems in certain situations. Agreed. Here are some possible solutions: 1) a boolean switch (in the api or system properties) 2) intelligence in the buffer itself, where it uses a tempfile after a certain size is reached 3) better overall architecture where buffers are immediately flushed to output rather than remaining in memory (3) seems best and is in line with the next-gen design documents I see on the fop site, but I don't know how far along y'all are with that. I have to use a similar architecture for my map translation software; GIS systems are hundreds of megabytes and scalability requires a flat memory usage model. All my buffers are strictly memory-limited. (1) is easy enough (2) would be fine but probably has pitfalls; the problem is that there are a _lot_ of these buffers and PDFStreams running around, and therefore it's a global problem - I counted dozens for one plot, 24MB total. I was planning on using a switch for the cvs patch, unless y'all have (3) figured out. I don't see the need for an extra PDFStreamGraphics2D class. Modifying the PDFGraphics2D should suffice. Agreed. I just didn't want to break the existing (the current patch uses PDFStreamGraphics2D just for my case). An extension may work better in this situation with the development code. If I understand the problem properly. ?? An extension to the code, or a file extension for the URL? I'm not sure what you mean. As far as my plans for the other features: I figure the drawImage hack is a no-brainer. It's just the right thing to do in that instance. The additional memory usage should be no big deal (it's a hash of image pointer to integer ID). I'll just modify PDFGraphics2D directly to use the underlying PDFStream. I think this is fine for all cases. Should I break it up into several patches? - tempfile buffering - drawImage hack - PDFGraphics2D hack - on-the-fly images -- Paul Reavis [EMAIL PROTECTED] Design Lead Partner Software, Inc.http://www.partnersoft.com - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, email: [EMAIL PROTECTED]
Re: diffs for on-the-fly image support
J.U. Anderegg ([EMAIL PROTECTED]) wrote To [EMAIL PROTECTED] on Tue, May 21, 2002 at 05:31:53PM +0200: Inserting JPEG into a PDF file is a simple file copy - given the URI, bits/pixel and color model. The latter are coded within JPEG files. PDF stores the image once and allows multiple references to it. Is programmed caching superior to the caching of the file system? From PDF view, memory = (JPEG file size + PDF encoded image) is needed at most during the lifetime of an output page in memory. Why isn't that so: device independence, AWT compatibility? Similar considerations apply to GIF, TIFF and Fax formats. I'm not sure exactly what you're referring to. My hacks primarily address the issue I had of rendering large vector plots of maps to pdf. The images that are used do not already exist as jpegs or any other form; they are an amalgam of vector routines and raster icons, and the icons are rotated in memory for speed. Generating this mess to svg, then into pdf was very time consuming and memory-intensive. So I switched to rendering directly into the pdf using the existing PDFGraphics2D, which allowed me to use the exact same routines that I use to render to the AWT window. Once I got that working I ran into memory problems, because the current design of the PDF generation code keeps a lot of things in memory as buffers, I believe because it doesn't know exactly where in the pdf file the data will be placed at final output - it's juggling layout etc. So, this is not a case of file buffering, but of storing chunks of rendered pdf for later use. My hack puts them in tempfiles rather than in in-memory buffers. This is obviously slower but more scalable. As for programmed caching being superior to file system caching, well, that's another debate and really depends on the operating system. For windows systems, especially over SMB networks, the answer is generally yes, because they are very unaggressive about disk caching and they flush a lot. Linux on the other hand is very aggressive (on my 754MB development machine, 320MB is being used for disk cache), and flushes less often. At least in my experience... in one case I got a 20x speed increase with a decent cacheing framework; I finally noticed that machines with crappy disk drive subsystems were far slower than those with good ones even at the same memory and cpu speed. -- Paul Reavis [EMAIL PROTECTED] Design Lead Partner Software, Inc.http://www.partnersoft.com - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, email: [EMAIL PROTECTED]