Daniel, if I were you I'd add an XSLT stage to your pipeline to clean up stuff like 
this prior to serialization. 

Add empty templates for the things you want to biff.

<xsl:template match="comment()"/>

<xsl:template match="*[contains(@style,'display:none')]"/>

<xsl:template match="@align|@color"/>

etc.

Use a template containing just "apply-templates" to strip tags while leaving their 
contents:

<xsl:template match="font|basefont|center">
<xsl:apply-templates/>
</xsl:template>

Add an identity template to copy everything else:

<xsl:template match="@*|*">
        <xsl:apply-templates select="@*"/>
        <xsl:apply-templates/>
</xsl:template>

You can also handle whitespace in XSLT too, see http://www.w3.org/TR/xslt#strip

e.g. To strip white space text nodes from inside any element, you could use the 
following:

<xsl:strip-space elements="*"/>

To normalize white space to a single space, you can use the xpath function 
"normalize-space()". e.g.

<xsl:template match="text()">
        <xsl:value-of select="normalize-space()"/>
</xsl:template>

In summary, you can handle all of this in a single XSLT and use the regular HTML or 
XML serializer just as before.

Cheers

Con
--
Conal Tuohy
Senior Programmer
+64-4-463-6844
+64-21-237-2498
New Zealand Electronic Text Centre
www.nzetc.org

-----Original Message-----
From: Daniel Willis [mailto:[EMAIL PROTECTED]
Sent: Tuesday, 28 September 2004 4:49 p.m.
To: [EMAIL PROTECTED]
Subject: Removing white space, comments, etc


Hello,
I've had a good look through the wiki, and across the internet in general, and I've 
been unable to find anything useful in relation to the stripping of white space and  
undesirable elements (comments, 'display:none', etc)
Could anyone suggest a good method of doing this, or is there an existing HTML 
serializer that we could use?
Kind Regards,
Daniel Willis,
Web developer.
http://www.tvnz.co.nz

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to