Re: Thoughts about image handling

2006-06-26 Thread Jeremias Maerki

On 22.06.2006 17:09:20 Max Berger wrote:
> Jeremias,
> 
> > Actually, you left out the pre-loading of the image size. That's
> > important if you want to delay image loading until the rendering stage
> > (or avoid it altogether). Note that not in all cases will it be
> > necessary to load an image. Sometimes only references to the images  
> > are
> > put in the output format. Currently, this only applies to RTF but  
> > could
> > actually be used in PostScript and maybe AFP output. Special languages
> > such as PPML even go so far an make it their prime purpose not  
> > having to
> > handle all the image data but providing them on the targte platform.
> 
> Good point. This requires parsing the header without actually parsing  
> the whole imagedata.  I know that ImageIO provides this capability,  
> but I am not sure about JIMI and other solutions.
> 
> AAMOF the ImageIO interface provides the exact capabilities required:  
> Registering of Image Readers, providing support for setting the image  
> data source, then getting all the meta-info without decoding the  
> actual image. Unfortunately ImageIO does not seem to support getting  
> the "raw" image data, but that functionality is available in fop's  
> JpegImage.

Remember that FOP needs to maintain compatibility to JDK 1.3 so entirely
relying on ImageIO is currently not an option. But we already have
classes which handle the pre-loading for the most important image
formats. See org.apache.fop.images.analyser.

> So there are more steps:
> - detect file format (may already read metadata and image if needed)
> - read metadata (may read image if needed)
> - read image data OR get raw image data.

That's actually what we're already doing today. :-)

> > Going towards Raster or RenderedImage for the in-memory representation
> > of the image is certainly a very welcome step.
> >
> > What I'm missing a little is that certain images will be converted
> > before they are processed by the renderer. For example, Barcode4J
> > converts its barcodes to SVG, EPS or Java2D graphics depending on the
> > output format in use. Generally, each renderer will have different
> > preferences how an image will be processed.
> 
> That is exactly the problem I see in the current renderers:
> 
> To render MathML or plan (in the examples) to pdf they are first  
> rendered as SVG and then the SVG is rendered via Java2D into the PDF.  
> I don't know how barcode4j does it, but I would assume it is similar.

Actually, Barcode4J is much more advanced. It supports native EPS
construction and painting using Graphics2D. Working today. It's only a
matter of writing an XMLHandler implementation for MathML to do the same.
No witchcraft involved.

> 
> However, every one of the renderers (or at least the awt, ps, and pdf  
> renderers) support a Java2D compatibility interface, which is  
> currently used for SVG images.

...through XMLHandler and AbstractGraphics2DAdapter

> What I propose is:
> - If there is a Java2D interface, offer that directly to all vector  
> image providers.

we have that today

> - If there is no Java2D interface (such as in the rtf output) then  
> render the vector image into a bitmap image (awt has standard support  
> for that, however I do not know if it works headless in 1.3) and use  
> that.

See

> > While PDF can embed TIFF
> > CCITT4 files directly, they have to be decoded for PCL. The ideal  
> > image
> > subsystem will also cache a preconverted image so the
> > conversion/decoding can be avoided next time the image is used.
> 
> Ok. Here's an idea: Use a Map ImageURL-> ImageInfo, where ImageInfo  
> contains mime-type, metadata, raw content and decoded content (if  
> possible). Every one of them may be null, and will be loaded on  
> demand. For the actual imagedata, a SoftReference may be used.

That's not any different from today except that you imply support for
loading raw AND decoded data. I'd actually decouple the actual flavor 
(raw or decoded or whatever) from the actual image info. That way the
large data blobs are reclaimable individually.

> > I don't think that'll work considering the above. I rather think the
> > Renderer will have to tell the image subsystem the preferred flavor of
> > the image. It will then receive the image in the right form if that is
> > possible.
> 
> How about this (for bitmaps):
> 
> An image has a "native" format, which describes the raw stream if  
> possible. Typical values are EPS, DCT, RLE, LZW, CCITT, and so on.
> The renderer can then check if it has support for this raw data type.  
> If so, it will use it. If not, it will have to use the decoded Raster  
> data.

Again, I don't see the difference to today.

> We can provide additional compressors which will take raster data and  
> provide raw data in one of the lossless formats (gzip, rle). They can  
> be used by the renderer to reduce file size.

Today, this is handled directly by the renderer. Don't see any need to
change it. These compressor

Re: Thoughts about image handling

2006-06-23 Thread Max Berger

Jeremias,


Actually, you left out the pre-loading of the image size. That's
important if you want to delay image loading until the rendering stage
(or avoid it altogether). Note that not in all cases will it be
necessary to load an image. Sometimes only references to the images  
are
put in the output format. Currently, this only applies to RTF but  
could

actually be used in PostScript and maybe AFP output. Special languages
such as PPML even go so far an make it their prime purpose not  
having to

handle all the image data but providing them on the targte platform.


Good point. This requires parsing the header without actually parsing  
the whole imagedata.  I know that ImageIO provides this capability,  
but I am not sure about JIMI and other solutions.


AAMOF the ImageIO interface provides the exact capabilities required:  
Registering of Image Readers, providing support for setting the image  
data source, then getting all the meta-info without decoding the  
actual image. Unfortunately ImageIO does not seem to support getting  
the "raw" image data, but that functionality is available in fop's  
JpegImage.


So there are more steps:
- detect file format (may already read metadata and image if needed)
- read metadata (may read image if needed)
- read image data OR get raw image data.


Going towards Raster or RenderedImage for the in-memory representation
of the image is certainly a very welcome step.

What I'm missing a little is that certain images will be converted
before they are processed by the renderer. For example, Barcode4J
converts its barcodes to SVG, EPS or Java2D graphics depending on the
output format in use. Generally, each renderer will have different
preferences how an image will be processed.


That is exactly the problem I see in the current renderers:

To render MathML or plan (in the examples) to pdf they are first  
rendered as SVG and then the SVG is rendered via Java2D into the PDF.  
I don't know how barcode4j does it, but I would assume it is similar.


However, every one of the renderers (or at least the awt, ps, and pdf  
renderers) support a Java2D compatibility interface, which is  
currently used for SVG images.


What I propose is:
- If there is a Java2D interface, offer that directly to all vector  
image providers.
- If there is no Java2D interface (such as in the rtf output) then  
render the vector image into a bitmap image (awt has standard support  
for that, however I do not know if it works headless in 1.3) and use  
that.



While PDF can embed TIFF
CCITT4 files directly, they have to be decoded for PCL. The ideal  
image

subsystem will also cache a preconverted image so the
conversion/decoding can be avoided next time the image is used.


Ok. Here's an idea: Use a Map ImageURL-> ImageInfo, where ImageInfo  
contains mime-type, metadata, raw content and decoded content (if  
possible). Every one of them may be null, and will be loaded on  
demand. For the actual imagedata, a SoftReference may be used.



I don't think that'll work considering the above. I rather think the
Renderer will have to tell the image subsystem the preferred flavor of
the image. It will then receive the image in the right form if that is
possible.


How about this (for bitmaps):

An image has a "native" format, which describes the raw stream if  
possible. Typical values are EPS, DCT, RLE, LZW, CCITT, and so on.
The renderer can then check if it has support for this raw data type.  
If so, it will use it. If not, it will have to use the decoded Raster  
data.


We can provide additional compressors which will take raster data and  
provide raw data in one of the lossless formats (gzip, rle). They can  
be used by the renderer to reduce file size.







To support extensibility, a registration mechanism is provided. Here
is the basic idea:

Java provides standard mechanisms to find all resources with a given
name in all classpath items. This allows to find all META-INF/
MANIFEST.MF files given in all JAR files in the classpath (1). These
files can be parsed using standard Manifest functionality.

The files contain some attributes that describe classes used. For
image handlers, this could be a classname and the supported image
type. It may contain additional attributes, such as supported
subtypes (e.g. LZW for TIFF). Ideally the exact specification of
these attributes would be coordinated between fop and foray to
support reuse.

This information can be parsed once and stored.

This mechanism requires the user to change only the classpath, and
nothing else.


Ok, something like that sounds pretty good. Remains to be seen whether
the config needs to be in a file or rather in a factory class like  
we've

done it before (example: AbstractRendererMaker). The only thing left
might be the question how to handle priorities if two implementations
support the same kind of image.
What you describe here is already in use in FOP and Batik. We don't  
use

the MANIFEST.MF directly but the class

Re: Thoughts about image handling

2006-06-19 Thread Jeremias Maerki

On 19.06.2006 16:52:25 Max Berger wrote:
> Dear Fop developers,
> 
> after a while and some thinking, here is my concept for extensible  
> image handlers for fop, or even better for xmlgraphics. If desired, I  
> can implement this concept for xmlgraphics / fop with support for  
> imageio, jimi and batik.
> 
> Image handling is  a three-step process. Step one is detecting the  
> file format, step two is loading the image, step three is outputting  
> the image into whatever the output renderer is.

Actually, you left out the pre-loading of the image size. That's
important if you want to delay image loading until the rendering stage 
(or avoid it altogether). Note that not in all cases will it be
necessary to load an image. Sometimes only references to the images are
put in the output format. Currently, this only applies to RTF but could
actually be used in PostScript and maybe AFP output. Special languages
such as PPML even go so far an make it their prime purpose not having to
handle all the image data but providing them on the targte platform.

> 
> Step 1: Detecting the file format.
> 
> In most handlers this currently happens by trying to load the files,  
> which is very inefficient. Instead, detectors like jmimemagic should  
> be used. These should return a mime-type or null if the type can not  
> be detected.
> 
> To speed up the process, a mime-type may be guessed from the file  
> extension, and added as "hint"
> 
> Example interface:
> 
> public interface MimeTypeDetector {
>String detectMimeType(URL file, String probableType) throws  
> IOException;
> }

Ok, so far I don't see any advantage over the current code.

> Step 2: Loading the image.
> 
> The image can then be loaded by any image handler that supports this  
> type.
> 
> Example:
> 
> public interface ImageHandler {
>void setImage(URL file) throws IOException;
>// ...
> }

In your scenario, when would the image actually be loaded?

> Step 3:  Outputting the image.
> 
> Generally there are three types of images. Vector images (SVG, MML),  
> bitmap images (GIF, BMP, JPEG), and uninterpreted images (EPS)
> 
> - Vector images must supply a paint(Graphics g) function
> - Bitmap images must supply a method to get the image contents as a  
> Raster (similar to java.awt.RenderedImage)
> - Bitmap images may supply a method to get the image contents in LZW,  
> ZLIB, DCT, format, such as in JPEG or TIFF compression (used in PDF)
> - Uninterpreted images just provide a method to get the contents in  
> its original format.

Going towards Raster or RenderedImage for the in-memory representation
of the image is certainly a very welcome step.

What I'm missing a little is that certain images will be converted
before they are processed by the renderer. For example, Barcode4J
converts its barcodes to SVG, EPS or Java2D graphics depending on the
output format in use. Generally, each renderer will have different
preferences how an image will be processed. While PDF can embed TIFF
CCITT4 files directly, they have to be decoded for PCL. The ideal image
subsystem will also cache a preconverted image so the
conversion/decoding can be avoided next time the image is used.

> Reflection may be used in the renderer to find out the image type,  
> rather than checking for the type. Example:
> 
> if (image instanceOf VectorImage)  {
>((VectorImage)image).paint(graphics)
> } else if (image instanceOf DCTEncodedImage) {
>addResource(image.getDCTData)
> // ...

I don't think that'll work considering the above. I rather think the
Renderer will have to tell the image subsystem the preferred flavor of
the image. It will then receive the image in the right form if that is
possible.

> 
> To support extensibility, a registration mechanism is provided. Here  
> is the basic idea:
> 
> Java provides standard mechanisms to find all resources with a given  
> name in all classpath items. This allows to find all META-INF/ 
> MANIFEST.MF files given in all JAR files in the classpath (1). These  
> files can be parsed using standard Manifest functionality.
> 
> The files contain some attributes that describe classes used. For  
> image handlers, this could be a classname and the supported image  
> type. It may contain additional attributes, such as supported  
> subtypes (e.g. LZW for TIFF). Ideally the exact specification of  
> these attributes would be coordinated between fop and foray to  
> support reuse.
> 
> This information can be parsed once and stored.
> 
> This mechanism requires the user to change only the classpath, and  
> nothing else.

Ok, something like that sounds pretty good. Remains to be seen whether
the config needs to be in a file or rather in a factory class like we've
done it before (example: AbstractRendererMaker). The only thing left
might be the question how to handle priorities if two implementations
support the same kind of image.

> I have written a short proof-of concept code for the registration,  
> available at
>

Thoughts about image handling

2006-06-19 Thread Max Berger

Dear Fop developers,

after a while and some thinking, here is my concept for extensible  
image handlers for fop, or even better for xmlgraphics. If desired, I  
can implement this concept for xmlgraphics / fop with support for  
imageio, jimi and batik.


Image handling is  a three-step process. Step one is detecting the  
file format, step two is loading the image, step three is outputting  
the image into whatever the output renderer is.



Step 1: Detecting the file format.

In most handlers this currently happens by trying to load the files,  
which is very inefficient. Instead, detectors like jmimemagic should  
be used. These should return a mime-type or null if the type can not  
be detected.


To speed up the process, a mime-type may be guessed from the file  
extension, and added as "hint"


Example interface:

public interface MimeTypeDetector {
  String detectMimeType(URL file, String probableType) throws  
IOException;

}

Step 2: Loading the image.

The image can then be loaded by any image handler that supports this  
type.


Example:

public interface ImageHandler {
  void setImage(URL file) throws IOException;
  // ...
}

Step 3:  Outputting the image.

Generally there are three types of images. Vector images (SVG, MML),  
bitmap images (GIF, BMP, JPEG), and uninterpreted images (EPS)


- Vector images must supply a paint(Graphics g) function
- Bitmap images must supply a method to get the image contents as a  
Raster (similar to java.awt.RenderedImage)
- Bitmap images may supply a method to get the image contents in LZW,  
ZLIB, DCT, format, such as in JPEG or TIFF compression (used in PDF)
- Uninterpreted images just provide a method to get the contents in  
its original format.


Reflection may be used in the renderer to find out the image type,  
rather than checking for the type. Example:


if (image instanceOf VectorImage)  {
  ((VectorImage)image).paint(graphics)
} else if (image instanceOf DCTEncodedImage) {
  addResource(image.getDCTData)
// ...


To support extensibility, a registration mechanism is provided. Here  
is the basic idea:


Java provides standard mechanisms to find all resources with a given  
name in all classpath items. This allows to find all META-INF/ 
MANIFEST.MF files given in all JAR files in the classpath (1). These  
files can be parsed using standard Manifest functionality.


The files contain some attributes that describe classes used. For  
image handlers, this could be a classname and the supported image  
type. It may contain additional attributes, such as supported  
subtypes (e.g. LZW for TIFF). Ideally the exact specification of  
these attributes would be coordinated between fop and foray to  
support reuse.


This information can be parsed once and stored.

This mechanism requires the user to change only the classpath, and  
nothing else.


I have written a short proof-of concept code for the registration,  
available at

  http://max.berger.name/tmp/extTestMain.jar
and
  http://max.berger.name/tmp/extTestProvider.jar

(source is included in the jar files).

To run, try:
  java -cp extTestMain.jar name.berger.max.test.ext.Main
or
 java -cp extTestMain.jar:extTestProvider.jar  
name.berger.max.test.ext.Main




(1) Of course, it doesn't have to be MANIFEST.MF.  For resources such  
as fonts this may well be META-INF/FontDescriptor.xml or something else.




questions? comments?

Max Berger
e-mail: [EMAIL PROTECTED]

--
PGP/GnuPG ID: E81592BC   Print: F489F8759D4132923EC4  
BC7E072AB73AE81592BC
For information about me or my projects please see http:// 
max.berger.name





PGP.sig
Description: This is a digitally signed message part