"David A. Desrosiers" said:
> Ok, clearly I have to spend a weekend and really understand how
> the back-end architechture of sitescooper works.
Thankfully, it's not too tricky!
Whereas most mirroring software gets pages to a specified link depth, and
doesn't differentiate the pages, sitescooper allows you to specify regular
expressions that signify that a page is at a certain depth.
For example, given a multi-section news site like this:
http://foo.com/ front page, links to:
http://foo.com/business/ business news
http://foo.com/food/ food news (?)
http://foo.com/blah/ news about blah
http://foo.com/blah/story512.html a story page
the sitescooper model is to define regexps like this:
URL: http://foo.com/
ContentsURL: http://foo.com/(business|food|blah)/
StoryURL: http://foo.com/\S+/story\d+.html
Typically each of those "levels" has a different page layout, so being
able to differentiate them like this means you can define what bits of the
page HTML to strip, based on what level the page is at.
--j.
_______________________________________________
Sitescooper-talk mailing list
[EMAIL PROTECTED]
http://lists.sourceforge.net/lists/listinfo/sitescooper-talk