The source of the problem is that FOP is mistakenly calculating the base-URI for the images. The purpose of xml:base is to rectify situations like this, but it seems that FOP does not know what to do with xml:base. For you, it's reporting xml:base as an invalid attribute of the fo:block element. For me, it just ignores xml:base and continues (of course still incorrectly resolving the image URIs).

First thing I would verify is that your XSLT and/or XSL-FO files are aware of the "xml" namespace. I tried this first on my setup, but it did not help. Maybe it will help you?

If that doesn't work, there are certainly ways to work around the problem:

1) Download the images to a corresponding local directory in a pre-processing step. Or you could program a Xalan extension that basically does the same thing "inline" in the XSLT.

2) Explicitly compute the base URI for the particular HTML document you're processing and use that value to create an absolute URI in <fo:external-graphic>. Something like "concat($base-URI,@src)". You could either pass in $base-URI as a global XSLT parameter (if you know this beforehand), or you can calculate it dynamically for each document you process by finding the directory of the a/@href attribute (the portion of the URI leading up to the file, but not the file name itself). Then you'd send that value along to the matching templates with <xsl:apply-templates><xsl:with-param name="base-URI" select="blah"/>, and you'd need to make sure your matching templates include that parameter with <xsl:param name="base-URI"/>. Then, when you're outputting the external graphic, you'd do something like <fo:external-graphic src="url('{concat($base-URI,@src)}')"/>.

3) Tell FOP to use a given base-URI via the configuration file. See http://xmlgraphics.apache.org/fop/0.94/configuration.html. It seems you can only do this if you already know the base-URI value beforehand, and you can only have one value per FOP run. I actually used this option with your example files and it worked for me. No change was needed to your files, and I didn't need to use xml:base, but of course you need to know the base-URI beforehand, or otherwise somehow loop through calls to FOP, updating the config file appropriately on each call.

Cheers,
Nathan Nadeau


Jack Bates wrote:
I originally posted this question to the FOP users list,
http://thread.gmane.org/gmane.text.xml.fop.user/29778

I'm writing a stylesheet that includes this template,

  <xsl:template match="html:img">
    <fo:external-graphic src="url('{...@src}')"/>
  </xsl:template>

- but when I process it I get errors like,

 $ fop -xml index.html -xsl index.xsl index.pdf
 [...]
 16-Nov-2009 11:06:32 AM org.apache.fop.events.LoggingEventListener processEvent
 SEVERE: Image not found. URI: 
/docs/images/thumb/b/bb/UM-2.1.png/500px-UM-2.1.png. (No context info available)
 [...]

- and the generated PDF is missing some images : (

I think the problem is that the stylesheet uses the document() function
to get HTML pages from the web,

  <xsl:template match="html:a" mode="foo">
    <fo:block>
      <xsl:apply-templates select="document(@href)//html:body"/>
    </fo:block>
  </xsl:template>

- and these pages, e.g. http://ica-atom.org/docs/index.php?title=UM-2.1

- have some relative images,
e.g. /docs/images/thumb/b/bb/UM-2.1.png/500px-UM-2.1.png

The <img> elements are part of the HTML document, and so relative to
http://ica-atom.org/docs/index.php?title=UM-2.1, but I guess the
<external-graphic> elements aren't?

I tried adding an xml:base="" attribute,

  <xsl:template match="html:a" mode="foo">
    <fo:block xml:base="{...@href}">
      <xsl:apply-templates select="document(@href)//html:body"/>
    </fo:block>
  </xsl:template>

Here's a simplified, complete example,

http://www.sfu.ca/~jdbates/tmp/fop/200911160/index.html

http://www.sfu.ca/~jdbates/tmp/fop/200911160/index.xsl

- but now when I process this example I get this error,

 $ fop -xml index.html -xsl index.xsl index.pdf
 16-Nov-2009 11:29:57 AM org.apache.fop.cli.Main startFOP
 SEVERE: Exception
 javax.xml.transform.TransformerException: org.apache.fop.fo.ValidationException: Invalid 
property encountered on "fo:block": xml:base (No context info available)
         at org.apache.fop.cli.InputHandler.transformTo(InputHandler.java:314)
         at org.apache.fop.cli.InputHandler.renderTo(InputHandler.java:146)
         at org.apache.fop.cli.Main.startFOP(Main.java:174)
         at org.apache.fop.cli.Main.main(Main.java:205)
 [...]

How can I automatically convert these HTML pages to a PDF with XSL-FO?

Reply via email to