RE: Cannot convert XHTML to XML-FO using Xalan-J and xhtml2fso.xs lstylesheet

Larry Trammell Fri, 02 Nov 2007 09:47:08 -0800

As a co-newbie user of Xalan, I somewhat recently tried a similar
experiment. To jump to the end of the story, I'm still using Xalan, but not
the xhtml2fop.xslt sheet. I managed to get the translation and PDF rendering
to work, but it was not the sort of slam-dunk success I had hoped for. In
the FOPS processing, there were seemingly minor details of perfectly valid
XHTML that were not completely implemented, and certain major details that
were implemented in a rigid manner that I couldn't live with. I can't really
blame FOPS or the XSL translation. When going from a sparse representation
to an information rich representation, all of the extra information has to
come from somewhere, and xhtml2fop incorporates a lot of assumptions by
necessity. But taking control of the fops translation was too complicated,
and I gave it up in favor of the commercial Prince package, which is CSS
based -- but no less fussy about valid XHTML on its input side.


Before jumping to the conclusion that Xalan is not working, I would suggest
that you split your application sequence into parts. Hand compose an XHTML
file and validate it with command-line TIDY. Document identification is
critical (I know I'm not using the technically correct terminology here, but
I refuse to memorize the XML, XSL, and XPATH specifications as I should do
to make the XSL transformations in XALAN completely comprehensible). The
?xml and !DOCTYPE and namespace declarations are all critical, and if they
are not exactly right XALAN will tend to obey the rules exactly and take
anything it doesn't recognize (e.g. all content within tags) and spew it to
the output stream. As a first step, try something that is likely to work
given a valid xhtml input, such as the "identity translation" 

<!-- The Identity Transformation (from
http://www.dpawson.co.uk/xsl/sect2/identity.html -->

<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform";>
  <!-- Whenever you match any node or any attribute -->
  <xsl:template match="node()|@*">
    <!-- Copy the current node -->
    <xsl:copy>
      <!-- Including any attributes it has and any child nodes -->
      <xsl:apply-templates select="@*|node()"/>
    </xsl:copy>
  </xsl:template>
</xsl:stylesheet>

If your hand-tested xhtml content doesn't survive this parsing process, it
has no chance with xhtml2fop either. 

Once you are over this hurdle, take on the JTIDY processing. Inspect the raw
HTML in and the generated XHTML out. Does the JTIDY-generated XHTML file
have all of the necessary identification and namespace tags that you learned
about from your manually-composed XHTML file? 

Having passed two hurdles, the problem of getting the xhtml2fops translation
to work should be a little easier. Before running the FOPS processing, take
a close look at the XML result XALAN produces when it applies the
stylesheet. It should look like a valid XML file with lots of additional
"fops" tags inserted to wrap your original content. 

Once that is credible, give the FOPS processing and PDF output generator
another try. 

Good luck, and I hope this message from bitter experience helps.



-----Original Message-----
From: John Brown [mailto:[EMAIL PROTECTED]
Sent: Thursday, November 01, 2007 9:30 PM
To: xalan-j-users@xml.apache.org
Subject: Cannot convert XHTML to XML-FO using Xalan-J and
xhtml2fso.xslstylesheet



First of all, I know nothing about Xalan.

The following article in the JavaWorld Forums:

http://www.javaworld.com/javaworld/jw-04-2006/jw-0410-html.html

explains how to convert HTML to PDF by using JTidy, Xalan and Apache FOP.
The steps are as follows:
1) HTML -> XHTML uing jtidy-04aug2000r7-dev
2) XHTML -> XML-FO using xalan-j 2.7.0
3) XML-FO -> PDF using Apache fop-0.20.5

My problem is that Xalan does not seem to be working.

I am using Windows XP, and I have these packages installed in
c:\downloads\utils\jtidy-04aug2000r7-dev
c:\downloads\utils\xalan-j_2_7_0
c:\downloads\utils\fop-0.20.5
(All binary distributions; I did not compile them myself)
Java 1.6 JRE (1.6.0_03) in C:\program files\java\jre1.6.0_03
Java 1.6 JDK (1.6.0_03) in C:\program files\java\jdk1.6.0_03

xhtml2fo.xsl is an XML stylesheet for converting the output of JTidy to
XML-FO. This stylesheet is supplied with the article.


Following the instructions in the article, I ran the following commands:
java -cp c:\downloads\utils\jtidy-04aug2000r7-dev\Tidy.jar -asxml \
  hello.html> temp.xml

java -cp \
  c:\downloads\utils\xalan-j_2_7_0\xalan.jar\
  ;c:\downloads\utils\xalan-j_2_7_0\xercesImpl.jar\
  ;c:\downloads\utils\xalan-j_2_7_0\xml-apis.jar\
  ;c:\downloads\utils\xalan-j_2_7_0\serializer.jar\
  org.apache.xalan.xslt.Process -TT -TG -TS -TTC -IN temp.xml\
  -XSL xhtml2fo.xsl -OUT temp.fo 
c:\downloads\utils\fop-0.20.5\fop.bat temp.fo hello.pdf

Of course, I did not really use the "\" character. Each command is on
one line, with no unnecessary white space.


JTidy produces the following XML file:




Hello World


Hello World!



This seems to be OK.

The console output of xalan is:
null Line #0, Column #0: template match='/' 
file:///C:/Downloads/Utils/html2pdf/xhtml2fo.xsl Line #0, Column #0:
apply-templates
Selected source node '#document', at file
'file:///C:/Downloads/Utils/html2pdf/temp.xml', line #-1, column #-1
(default root rule) apply-templates, select='null': 
     10001: html
null Line #0, Column #0: template match='*' 
file:///C:/Downloads/Utils/html2pdf/xhtml2fo.xsl Line #0, Column #0:
apply-templates
Selected source node 'html', at file
'file:///C:/Downloads/Utils/html2pdf/temp.xml', line #-1, column #-1
(default rule) apply-templates, select='null': 
     10004: #text
     10005: head
     1000e: #text
     1000f: body
     10014: #text
STARTDOCUMENT
CHARACTERS: 

null Line #0, Column #0: template match='*' 
file:///C:/Downloads/Utils/html2pdf/xhtml2fo.xsl Line #0, Column #0:
apply-templates
Selected source node 'head', at file
'file:///C:/Downloads/Utils/html2pdf/temp.xml', line #-1, column #-1
(default rule) apply-templates, select='null': 
     10006: #text
     10007: meta
     1000a: #text
     1000b: title
     1000d: #text
CHARACTERS: 

null Line #0, Column #0: template match='*' 
file:///C:/Downloads/Utils/html2pdf/xhtml2fo.xsl Line #0, Column #0:
apply-templates
Selected source node 'meta', at file
'file:///C:/Downloads/Utils/html2pdf/temp.xml', line #-1, column #-1
(default rule) apply-templates, select='null': 
     [empty node list]
CHARACTERS: 

null Line #0, Column #0: template match='*' 
file:///C:/Downloads/Utils/html2pdf/xhtml2fo.xsl Line #0, Column #0:
apply-templates
Selected source node 'title', at file
'file:///C:/Downloads/Utils/html2pdf/temp.xml', line #-1, column #-1
(default rule) apply-templates, select='null': 
     1000c: #text
CHARACTERS: Hello World
CHARACTERS: 

CHARACTERS: 

null Line #0, Column #0: template match='*' 
file:///C:/Downloads/Utils/html2pdf/xhtml2fo.xsl Line #0, Column #0:
apply-templates
Selected source node 'body', at file
'file:///C:/Downloads/Utils/html2pdf/temp.xml', line #-1, column #-1
(default rule) apply-templates, select='null': 
     10010: #text
     10011: p
     10013: #text
CHARACTERS: 

null Line #0, Column #0: template match='*' 
file:///C:/Downloads/Utils/html2pdf/xhtml2fo.xsl Line #0, Column #0:
apply-templates
Selected source node 'p', at file
'file:///C:/Downloads/Utils/html2pdf/temp.xml', line #-1, column #-1
(default rule) apply-templates, select='null': 
     10012: #text
CHARACTERS: Hello World!
CHARACTERS: 

CHARACTERS: 

ENDDOCUMENT

I do not understand any of this, but all those lines with line #-1,
column #-1 cannot be good. The resulting file is:




Hello World


Hello World!

This is obviously wrong.

The article supplied a file Html2Pdf.java, which is supposed to be the
equivalent of the sequence of commands. I compiled it, and it works.

What am I doing wrong?
_________________________________________________________________
Help yourself to FREE treats served up daily at the Messenger Café. Stop by
today.
http://www.cafemessenger.com/info/info_sweetstuff2.html?ocid=TXT_TAGLM_OctWL
tagline

RE: Cannot convert XHTML to XML-FO using Xalan-J and xhtml2fso.xs lstylesheet

Reply via email to