The source of the problem is that FOP is mistakenly calculating the
base-URI for the images. The purpose of xml:base is to rectify
situations like this, but it seems that FOP does not know what to do
with xml:base. For you, it's reporting xml:base as an invalid attribute
of the fo:block element. For me, it just ignores xml:base and continues
(of course still incorrectly resolving the image URIs).
First thing I would verify is that your XSLT and/or XSL-FO files are
aware of the "xml" namespace. I tried this first on my setup, but it did
not help. Maybe it will help you?
If that doesn't work, there are certainly ways to work around the problem:
1) Download the images to a corresponding local directory in a
pre-processing step. Or you could program a Xalan extension that
basically does the same thing "inline" in the XSLT.
2) Explicitly compute the base URI for the particular HTML document
you're processing and use that value to create an absolute URI in
<fo:external-graphic>. Something like "concat($base-URI,@src)". You
could either pass in $base-URI as a global XSLT parameter (if you know
this beforehand), or you can calculate it dynamically for each document
you process by finding the directory of the a/@href attribute (the
portion of the URI leading up to the file, but not the file name
itself). Then you'd send that value along to the matching templates with
<xsl:apply-templates><xsl:with-param name="base-URI" select="blah"/>,
and you'd need to make sure your matching templates include that
parameter with <xsl:param name="base-URI"/>. Then, when you're
outputting the external graphic, you'd do something like
<fo:external-graphic src="url('{concat($base-URI,@src)}')"/>.
3) Tell FOP to use a given base-URI via the configuration file. See
http://xmlgraphics.apache.org/fop/0.94/configuration.html. It seems you
can only do this if you already know the base-URI value beforehand, and
you can only have one value per FOP run. I actually used this option
with your example files and it worked for me. No change was needed to
your files, and I didn't need to use xml:base, but of course you need to
know the base-URI beforehand, or otherwise somehow loop through calls to
FOP, updating the config file appropriately on each call.
Cheers,
Nathan Nadeau
Jack Bates wrote:
I originally posted this question to the FOP users list,
http://thread.gmane.org/gmane.text.xml.fop.user/29778
I'm writing a stylesheet that includes this template,
<xsl:template match="html:img">
<fo:external-graphic src="url('{...@src}')"/>
</xsl:template>
- but when I process it I get errors like,
$ fop -xml index.html -xsl index.xsl index.pdf
[...]
16-Nov-2009 11:06:32 AM org.apache.fop.events.LoggingEventListener processEvent
SEVERE: Image not found. URI:
/docs/images/thumb/b/bb/UM-2.1.png/500px-UM-2.1.png. (No context info available)
[...]
- and the generated PDF is missing some images : (
I think the problem is that the stylesheet uses the document() function
to get HTML pages from the web,
<xsl:template match="html:a" mode="foo">
<fo:block>
<xsl:apply-templates select="document(@href)//html:body"/>
</fo:block>
</xsl:template>
- and these pages, e.g. http://ica-atom.org/docs/index.php?title=UM-2.1
- have some relative images,
e.g. /docs/images/thumb/b/bb/UM-2.1.png/500px-UM-2.1.png
The <img> elements are part of the HTML document, and so relative to
http://ica-atom.org/docs/index.php?title=UM-2.1, but I guess the
<external-graphic> elements aren't?
I tried adding an xml:base="" attribute,
<xsl:template match="html:a" mode="foo">
<fo:block xml:base="{...@href}">
<xsl:apply-templates select="document(@href)//html:body"/>
</fo:block>
</xsl:template>
Here's a simplified, complete example,
http://www.sfu.ca/~jdbates/tmp/fop/200911160/index.html
http://www.sfu.ca/~jdbates/tmp/fop/200911160/index.xsl
- but now when I process this example I get this error,
$ fop -xml index.html -xsl index.xsl index.pdf
16-Nov-2009 11:29:57 AM org.apache.fop.cli.Main startFOP
SEVERE: Exception
javax.xml.transform.TransformerException: org.apache.fop.fo.ValidationException: Invalid
property encountered on "fo:block": xml:base (No context info available)
at org.apache.fop.cli.InputHandler.transformTo(InputHandler.java:314)
at org.apache.fop.cli.InputHandler.renderTo(InputHandler.java:146)
at org.apache.fop.cli.Main.startFOP(Main.java:174)
at org.apache.fop.cli.Main.main(Main.java:205)
[...]
How can I automatically convert these HTML pages to a PDF with XSL-FO?