In this case, the STX transformer would probably be faster (SAX based rather than in-memory document). The syntax isn't too much different from the XSLT.

There's also a cleanup/indent transformer that I put into bugzilla a little while back. (http://issues.apache.org/bugzilla/show_bug.cgi?id=30018) It wouldn't solve your comment stripping issue but it would certainly make your output easier to read. I should probably rewrite it as a serializer sometime, but then I'd have to deal with encoding issues et al. I'd much prefer that Cocoon got compound sitemap components (serializer = transformer(s) + serializer) and used the cleaup/indent transformer as part of a larger serialization step.

The case of display:none is tricky if you use the style tag or an external stylesheet. I don't believe there are any facilities to handle that at the moment. ...and the only way I can think to handle it would be to code a full browser's CSS logic sans rendering widget -- a non-trivial exercise to be sure. If the issue is output file size, have you considered gzip compressing the output with either a servlet filter or with Apache's mod_deflate? Compression will likely do more good than simply stripping the content and I doubt the CPU overhead would be that much higher in the grand scheme of things if you cache intelligently.

- Miles Elam


Conal Tuohy wrote:

Daniel, if I were you I'd add an XSLT stage to your pipeline to clean up stuff like this prior to serialization.


-----Original Message----- From: Daniel Willis [mailto:[EMAIL PROTECTED] Sent: Tuesday, 28 September 2004 4:49 p.m. To: [EMAIL PROTECTED] Subject: Removing white space, comments, etc


Hello, I've had a good look through the wiki, and across the internet in general, and I've been unable to find anything useful in relation to the stripping of white space and undesirable elements (comments, 'display:none', etc) Could anyone suggest a good method of doing this, or is there an existing HTML serializer that we could use?


--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]



Reply via email to