Daniel, if I were you I'd add an XSLT stage to your pipeline to clean up stuff like
this prior to serialization.
Add empty templates for the things you want to biff.
<xsl:template match="comment()"/>
<xsl:template match="*[contains(@style,'display:none')]"/>
<xsl:template match="@align|@color"/>
etc.
Use a template containing just "apply-templates" to strip tags while leaving their
contents:
<xsl:template match="font|basefont|center">
<xsl:apply-templates/>
</xsl:template>
Add an identity template to copy everything else:
<xsl:template match="@*|*">
<xsl:apply-templates select="@*"/>
<xsl:apply-templates/>
</xsl:template>
You can also handle whitespace in XSLT too, see http://www.w3.org/TR/xslt#strip
e.g. To strip white space text nodes from inside any element, you could use the
following:
<xsl:strip-space elements="*"/>
To normalize white space to a single space, you can use the xpath function
"normalize-space()". e.g.
<xsl:template match="text()">
<xsl:value-of select="normalize-space()"/>
</xsl:template>
In summary, you can handle all of this in a single XSLT and use the regular HTML or
XML serializer just as before.
Cheers
Con
--
Conal Tuohy
Senior Programmer
+64-4-463-6844
+64-21-237-2498
New Zealand Electronic Text Centre
www.nzetc.org
-----Original Message-----
From: Daniel Willis [mailto:[EMAIL PROTECTED]
Sent: Tuesday, 28 September 2004 4:49 p.m.
To: [EMAIL PROTECTED]
Subject: Removing white space, comments, etc
Hello,
I've had a good look through the wiki, and across the internet in general, and I've
been unable to find anything useful in relation to the stripping of white space and
undesirable elements (comments, 'display:none', etc)
Could anyone suggest a good method of doing this, or is there an existing HTML
serializer that we could use?
Kind Regards,
Daniel Willis,
Web developer.
http://www.tvnz.co.nz
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]