On Wed, Apr 2, 2008 at 11:35 AM, Richard K Miller <[EMAIL PROTECTED]> wrote: > Adrian Holovaty (creator of ChicagoCrime.org and Django) has a Python > script called templatemaker[1][2], which in theory would do what I want. You > feed it a bunch of similar web pages and it produces a template with "holes" > where the data was different across each web page. In practice, it's too > granular; it doesn't recognize HTML. It looks at every I don't care about > spaces between tags. I only care about substantial content differences > across pages. Everything else can be moved to the template.
you could try running everything through HTML Tidy first, see if that normalizes whitespace and such. then run templatemaker and see how that works out. justin -- http://justinhileman.com _______________________________________________ UPHPU mailing list [email protected] http://uphpu.org/mailman/listinfo/uphpu IRC: #uphpu on irc.freenode.net
