I've been using Tidy for a couple of years to dynamically clean up HTML and it works like a charm. Yes, it's strict - but that's one of the things I like about it :-). Just for kicks, you can check out the effect of using Tidy by viewing the source of these two links:

http://hugi.karlmenn.is/?useTidy=false
http://hugi.karlmenn.is/?useTidy=true

But recently, I'm more interested in using the DOM to manipulate the response. There's just something perversely delightful about manipulating pages as object hierarchies rather than strings. Unfortunately, reality usually requires working with some badly formed HTML, and in that respect, Tidy is nice - it is forgiving and will attempt to fix your terrible, disgusting HTML. So, an example of what you can do:

---

public WOResponse dispatchRequest( WORequest request ) {
        ByteArrayInputStream in = response.content().stream();
        ERXRefByteArrayOutputStream out = new ERXRefByteArrayOutputStream();

        Document d = tidy().parseDOM( in, null );

        NodeList divNodes = d.getElementsByTagName( "div" );
        int i = divNodes.getLength();
                                
        while( i > 0 ) {
                Node n = divNodes.item( --i );
n.appendChild( d.createTextNode( "YARRRR, A MIGHTY FINE DIV I WAS" ) );
        }

        String prettyPrintedDocument = convertDOMDocumentToString( d );
        response.setContent( prettyPrintedDocument );
}

---

Which is nice. Apart from the facts that (a) Tidy seems to provide a rather lacklustre implementation of w3.Document, which I have no idea how to work around, and (b) I don't know **** about the DOM yet (although I know enough to guess the API sucks - Right?). I'm still just experimenting.

Sorry for the long post about nothing... I guess I just like to touch my keyboard.

- hugi

// Hugi Thordarson
// http://hugi.karlmenn.is/




On 18.4.2008, at 22:24, Mike Schrag wrote:
How extreme do you want to get? You can work some wonders with your HTML in dispatchRequest

Personally, I find that this generates the cleanest possible response:

public WOResponse dispatchReqcuest( WORequest request ) {
        WOResponse respone = super.dispatchRequest( request );
        response.setContent( "" );
        return response;        
}
I experimented last year with running output through tidy, but it ends up breaking all kinds of things (tidy is just too strict for most HTML):

  public WOResponse dispatchRequest(WORequest request) {
    WOResponse response = super.dispatchRequest(request);

    if (MDTApplication.contentTypeHTML(response)) {
      ByteArrayInputStream in = response.content().stream();
ERXRefByteArrayOutputStream out = new ERXRefByteArrayOutputStream();
      tidy().parseDOM(in, out);
      response.setContent(out.toNSData());
    }

    return response;
  }

Possibly exporting the formatter from WOLips to an external jar might give better results because it's designed to be very forgiving about how it interprets your HTML, but in the scheme of things, this is probably not worth the effort.

ms
_______________________________________________
Do not post admin requests to the list. They will be ignored.
Webobjects-dev mailing list      ([email protected])
Help/Unsubscribe/Update your Subscription:
http://lists.apple.com/mailman/options/webobjects-dev/hugi%40karlmenn.is

This email sent to [EMAIL PROTECTED]

_______________________________________________
Do not post admin requests to the list. They will be ignored.
Webobjects-dev mailing list      ([email protected])
Help/Unsubscribe/Update your Subscription:
http://lists.apple.com/mailman/options/webobjects-dev/archive%40mail-archive.com

This email sent to [EMAIL PROTECTED]

Reply via email to