On Apr 2, 2008, at 4:43 PM, thebigdog wrote:
Adrian Holovaty (creator of ChicagoCrime.org and Django) has a
Python
script called templatemaker[1][2], which in theory would do what I
want. You
feed it a bunch of similar web pages and it produces a template
with "holes"
where the data was different across each web page. In practice,
it's too
granular; it doesn't recognize HTML. It looks at every I don't
care about
spaces between tags. I only care about substantial content
differences
across pages. Everything else can be moved to the template.
you could try running everything through HTML Tidy first, see if that
normalizes whitespace and such. then run templatemaker and see how
that works out.
you could use a diff program to find out where they are different
and the kinda do the reverse and come up with the
similarities...however i would do it after running it all through
tidy first.
If it was up to me then i would look at taking 1 page and creating a
template from it and then extract all the data you need to populate
other pages with that template.
Thanks, Justin and Ray. Good ideas.
_______________________________________________
UPHPU mailing list
[email protected]
http://uphpu.org/mailman/listinfo/uphpu
IRC: #uphpu on irc.freenode.net