Re: diffs for on-the-fly image support

2002-05-22 Thread Keiron Liddle


Yes the several patches is good, thanks.
This way the appropriate ones can be applied to both code bases.

I agree that 3 is probably better and should be done for the development
code. 1 is suitable for a quick solution for the maintenance branch.

As for the extension, this is really for the development code. I don't
know exactly where you are getting your data etc. from but the new code
could handle this as an extension. The svg drawing itself is an
extension and it could be done in the same way. You supply a handler on
the user agent, this handler receives some xml data and has access to
the pdf document, streams etc. This could make it easier but I would
need more info.

On Tue, 2002-05-21 at 16:00, Paul Reavis wrote:
 Agreed. Here are some possible solutions:
 1) a boolean switch (in the api or system properties)
 2) intelligence in the buffer itself, where it uses a tempfile after a
 certain size is reached
 3) better overall architecture where buffers are immediately flushed
 to output rather than remaining in memory
 
 (3) seems best and is in line with the next-gen design documents I see
 on the fop site, but I don't know how far along y'all are with that. I
 have to use a similar architecture for my map translation software;
 GIS systems are hundreds of megabytes and scalability requires a flat
 memory usage model. All my buffers are strictly memory-limited.
 
 (1) is easy enough
 
 (2) would be fine but probably has pitfalls; the problem is that there
 are a _lot_ of these buffers and PDFStreams running around, and
 therefore it's a global problem - I counted dozens for one plot, 24MB
 total. 
 
 I was planning on using a switch for the cvs patch, unless y'all have
 (3) figured out.
 
  I don't see the need for an extra PDFStreamGraphics2D class. Modifying
  the PDFGraphics2D should suffice.
 
 Agreed. I just didn't want to break the existing (the current patch
 uses PDFStreamGraphics2D just for my case).
  
  An extension may work better in this situation with the development
  code. If I understand the problem properly.
 
 ?? An extension to the code, or a file extension for the URL? I'm not
 sure what you mean.
 
 As far as my plans for the other features:
 
 I figure the drawImage hack is a no-brainer. It's just the right thing
 to do in that instance. The additional memory usage should be no big
 deal (it's a hash of image pointer to integer ID).
 
 I'll just modify PDFGraphics2D directly to use the underlying
 PDFStream. I think this is fine for all cases.
 
 Should I break it up into several patches?
 - tempfile buffering
 - drawImage hack
 - PDFGraphics2D hack
 - on-the-fly images



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, email: [EMAIL PROTECTED]




Re: diffs for on-the-fly image support

2002-05-22 Thread Paul Reavis

Keiron Liddle ([EMAIL PROTECTED]) wrote To FOP on Wed, May 22, 2002 at 10:30:45AM 
+0200:

 
 Yes the several patches is good, thanks.
 This way the appropriate ones can be applied to both code bases.
 
 I agree that 3 is probably better and should be done for the development
 code. 1 is suitable for a quick solution for the maintenance branch.
 
 As for the extension, this is really for the development code. I don't
 know exactly where you are getting your data etc. from but the new code
 could handle this as an extension. The svg drawing itself is an
 extension and it could be done in the same way. You supply a handler on
 the user agent, this handler receives some xml data and has access to
 the pdf document, streams etc. This could make it easier but I would
 need more info.

In brief, the algo is this:

1) before pdf generation, the client program sets up the on-the-fly
snapshot objects - each is a subclass of OnTheFlyFopImage, supplying a
paint(Graphics2D) routine.

2) the client then registers the images somewhere in the FOP api (in
my current hack, with FopImageFactory directly) with a url like onthefly:uniquename

3) the client then runs the PDF generation

4) the PDFRenderer, when it encounters an external image reference
with an onthefly:uniquename URL, looks up the correspondingly-named
OnTheFlyFopImage in the registry 

5) the PDFRenderer then sets up a PDFGraphics2D and runs
OnTheFlyFopImage.paint on it.

6) at some point before or after pdf generation, the application can
clear the registry, freeing up any memory used by the OnTheFlyFopImages.

If you can describe in general what the algo would be for an extension
I'll be glad to try and implement it. Incidentally, am I getting the
development or maintenance branch when I just do a `cvs checkout`?

Here are the actual examples from my current (outside of FOP)
code. Incidentally, I really think there needs to be a library class
with static methods like my convert() that allow a simple default
embedding for folks - that's a lot of code to have to write just to
run fop.

... snip 

public void createOnTheFly(MapViewPanel sourcePanel, File reportDir) {
SystemLog.singleton().enter(Creating on-the-fly snapshots...);
try {
FopImageFactory.clearCache();
FopImageFactory.clearOnTheFlyImages();

Iterator e = getSnapshots().iterator();
int i = 0;
while (e.hasNext()) {
RenderMold currentSnapshot = (RenderMold)e.next();
currentSnapshot.setMonochromeBackground(monochromeBackground);
currentSnapshot.setInvertBackgroundColor(!noColorFiltering);
currentSnapshot.setPrinting(true);
SystemLog.singleton().enter(Rendering snapshot  + currentSnapshot + 
 to image);
this.setDrawFinerThanScale(currentSnapshot.getScale());

FopImageFactory.addOnTheFlyImage(Snapshot + i, new 
OnTheFlySnapshot(sourcePanel, currentSnapshot));
i++;
}

// wrap up
this.setDrawFinerThanScale(null);
}
catch ( Exception oopsie ) {
System.out.println(problem creating image in Snapshot source);
Death.instant(oopsie);
}
}


... snip 

private class OnTheFlySnapshot extends OnTheFlyFopImage {

private MapViewPanel sourcePanel;
private RenderMold mold;

public OnTheFlySnapshot(MapViewPanel sourcePanel, RenderMold mold) throws 
FopImageException {
super(onthefly:Snapshot, viewFinder.getWidth(), viewFinder.getHeight());
this.sourcePanel = sourcePanel;
this.mold = mold;
}

public void paint(Graphics2D graphics) {
if (isNoColorFiltering())
GUILib.setRenderingHintsForPrinting(graphics);
else
GUILib.setRenderingHintsForInvertedPrinting(graphics);

/*
SystemLog.singleton().enter(Setting on-the-fly clip to:  + 
viewFinder.getWidth() + ,  + viewFinder.getHeight());
graphics.setClip(0, 0, viewFinder.getWidth(), viewFinder.getHeight());
*/
graphics.setFont(sourcePanel.getFont());
if (noColorFiltering) {
graphics.setColor(Color.black);
graphics.fillRect(0, 0, viewFinder.getWidth(), viewFinder.getHeight());
}
else {
graphics.setColor(Color.white);
graphics.fillRect(0, 0, viewFinder.getWidth(), viewFinder.getHeight());
}

// iterate through layers and renderers to paint
Iterator it = sourcePanel.layers();
while (it.hasNext()) {
MapViewLayer currentLayer = (MapViewLayer)it.next();
SystemLog.singleton().enter(Rendering  + currentLayer +  to PDF);
 

Re: diffs for on-the-fly image support

2002-05-22 Thread Keiron Liddle


A normal cvs checkout gives you the development, which is different from
current maintenance releases.

What you are describing can definitely be done with an extension (in the
devel code only, so this is for later).
in your fo:
instream-foreign-object width=.. height=..
myImage xmlns=my-space id=unique-id/
/instream-foreign-object

This small bit of xml will then be passed to your extension available on
the user agent.
This extension gets the image and sets up the PDFGraphics2D and does its
thing.

It should be easier. This way the extra code is contained in a simple
extension. The difference is that you need to use
instream-foreign-object instead of image.

This class is the default pdf extension that handles svg:
http://cvs.apache.org/viewcvs.cgi/xml-fop/src/org/apache/fop/render/pdf/PDFXMLHandler.java?rev=1.4content-type=text/vnd.viewcvs-markup

On Wed, 2002-05-22 at 14:42, Paul Reavis wrote:
 In brief, the algo is this:
 
 1) before pdf generation, the client program sets up the on-the-fly
 snapshot objects - each is a subclass of OnTheFlyFopImage, supplying a
 paint(Graphics2D) routine.
 
 2) the client then registers the images somewhere in the FOP api (in
 my current hack, with FopImageFactory directly) with a url like onthefly:uniquename
 
 3) the client then runs the PDF generation
 
 4) the PDFRenderer, when it encounters an external image reference
 with an onthefly:uniquename URL, looks up the correspondingly-named
 OnTheFlyFopImage in the registry 
 
 5) the PDFRenderer then sets up a PDFGraphics2D and runs
 OnTheFlyFopImage.paint on it.
 
 6) at some point before or after pdf generation, the application can
 clear the registry, freeing up any memory used by the OnTheFlyFopImages.
 
 If you can describe in general what the algo would be for an extension
 I'll be glad to try and implement it. Incidentally, am I getting the
 development or maintenance branch when I just do a `cvs checkout`?
 
 Here are the actual examples from my current (outside of FOP)
 code. Incidentally, I really think there needs to be a library class
 with static methods like my convert() that allow a simple default
 embedding for folks - that's a lot of code to have to write just to
 run fop.



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, email: [EMAIL PROTECTED]




Re: diffs for on-the-fly image support

2002-05-21 Thread Keiron Liddle

Hi Paul,

I don't think we can apply this patch directly for a number of reasons.
Although there are parts in it with value that should be put into cvs
when you have finished.

The patch should be done against cvs rather than what you did which
seems to be in reverse anyway (I suppose this is what you are working
on). It's better to avoid the various formatting changes which really
confuses things.

Using temp files can cause problems in certain situations.
I don't see the need for an extra PDFStreamGraphics2D class. Modifying
the PDFGraphics2D should suffice.

An extension may work better in this situation with the development
code. If I understand the problem properly.

Thanks,
Keiron.

On Mon, 2002-05-20 at 17:43, Paul Reavis wrote:
 Attached are gzipped diffs for the changes I made vs. the 0.20.3
 release. I'm working on patches against CVS, but am pretty busy and
 wanted to get something out soonest.
 
 Essentially the patch includes:
 - support for callback-based, on-the-fly images (URLs like
 onthefly:SomeImage, you have to preregister the callback named
 SomeImage before running the FOP transformation)
 
 - a modified PDFGraphics2D called PDFStreamGraphics2D that does not
 use an intermediate byte buffer, but renders direct to a PDFStream
 
 - modified PDFStream so that it caches to tempfiles on disk rather
 than to heap
 
 - modified the drawImage portion of PDFStreamGraphics2D so that it
 only creates a new xObject for the image if it has never seen that
 image before, otherwise it reuses the reference
 
 The combination of these things took us from render times of up to 10
 minutes and hundreds of megabytes of heap to render times of less than
 10 seconds and less than 64MB of heap (the default max heap size).



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, email: [EMAIL PROTECTED]




Re: diffs for on-the-fly image support

2002-05-21 Thread Paul Reavis

Keiron Liddle ([EMAIL PROTECTED]) wrote To FOP on Tue, May 21, 2002 at 11:47:19AM 
+0200:

 I don't think we can apply this patch directly for a number of reasons.
 Although there are parts in it with value that should be put into cvs
 when you have finished.

I figured as much. Mainly I wanted to get an example out for anyone to
look at; the code I wrote is hardly high quality and I would rather do
a more careful modification against CVS.

Sorry about the formatting changes - emacs redid the indentation
according to my weird standard and I was just too lazy to fix it
back. Is there an apache or fop standard style or style canonicalizer?
I know some projects use a tool to fix style to a standard.

 Using temp files can cause problems in certain situations.

Agreed. Here are some possible solutions:
1) a boolean switch (in the api or system properties)
2) intelligence in the buffer itself, where it uses a tempfile after a
certain size is reached
3) better overall architecture where buffers are immediately flushed
to output rather than remaining in memory

(3) seems best and is in line with the next-gen design documents I see
on the fop site, but I don't know how far along y'all are with that. I
have to use a similar architecture for my map translation software;
GIS systems are hundreds of megabytes and scalability requires a flat
memory usage model. All my buffers are strictly memory-limited.

(1) is easy enough

(2) would be fine but probably has pitfalls; the problem is that there
are a _lot_ of these buffers and PDFStreams running around, and
therefore it's a global problem - I counted dozens for one plot, 24MB
total. 

I was planning on using a switch for the cvs patch, unless y'all have
(3) figured out.

 I don't see the need for an extra PDFStreamGraphics2D class. Modifying
 the PDFGraphics2D should suffice.

Agreed. I just didn't want to break the existing (the current patch
uses PDFStreamGraphics2D just for my case).
 
 An extension may work better in this situation with the development
 code. If I understand the problem properly.

?? An extension to the code, or a file extension for the URL? I'm not
sure what you mean.

As far as my plans for the other features:

I figure the drawImage hack is a no-brainer. It's just the right thing
to do in that instance. The additional memory usage should be no big
deal (it's a hash of image pointer to integer ID).

I'll just modify PDFGraphics2D directly to use the underlying
PDFStream. I think this is fine for all cases.

Should I break it up into several patches?
- tempfile buffering
- drawImage hack
- PDFGraphics2D hack
- on-the-fly images

-- 

Paul Reavis  [EMAIL PROTECTED]
Design Lead
Partner Software, Inc.http://www.partnersoft.com

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, email: [EMAIL PROTECTED]




Re: diffs for on-the-fly image support

2002-05-21 Thread Paul Reavis

J.U. Anderegg ([EMAIL PROTECTED]) wrote To [EMAIL PROTECTED] on Tue, 
May 21, 2002 at 05:31:53PM +0200:

 Inserting JPEG into a PDF file is a simple file copy - given the URI,
 bits/pixel and color model. The latter are coded within JPEG files. PDF
 stores the image once and allows multiple references to it. Is programmed
 caching superior to the caching of the file system?
 
 From PDF view, memory = (JPEG file size + PDF encoded image) is needed at
 most during the lifetime of an output page in memory. Why isn't that so:
 device independence, AWT compatibility?
 
 Similar considerations apply to GIF, TIFF and Fax formats.

I'm not sure exactly what you're referring to.

My hacks primarily address the issue I had of rendering large vector
plots of maps to pdf. The images that are used do not already exist as
jpegs or any other form; they are an amalgam of vector routines and
raster icons, and the icons are rotated in memory for
speed. Generating this mess to svg, then into pdf was very time
consuming and memory-intensive. So I switched to rendering directly
into the pdf using the existing PDFGraphics2D, which allowed me to use
the exact same routines that I use to render to the AWT window. 

Once I got that working I ran into memory problems, because the
current design of the PDF generation code keeps a lot of things in
memory as buffers, I believe because it doesn't know exactly where in
the pdf file the data will be placed at final output - it's juggling
layout etc.

So, this is not a case of file buffering, but of storing chunks
of rendered pdf for later use. My hack puts them in tempfiles rather
than in in-memory buffers. This is obviously slower but more scalable.

As for programmed caching being superior to file system caching, well,
that's another debate and really depends on the operating system. For
windows systems, especially over SMB networks, the answer is generally
yes, because they are very unaggressive about disk caching and they flush
a lot. Linux on the other hand is very aggressive (on my 754MB
development machine, 320MB is being used for disk cache), and flushes
less often. At least in my experience... in one case I got a 20x speed
increase with a decent cacheing framework; I finally noticed that machines
with crappy disk drive subsystems were far slower than those with good
ones even at the same memory and cpu speed.

-- 

Paul Reavis  [EMAIL PROTECTED]
Design Lead
Partner Software, Inc.http://www.partnersoft.com

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, email: [EMAIL PROTECTED]