(sorry about the delay -- a move to Oz left me with intermittent net access
for a while there ;)
Michael Graham said:
> I spent a little time this weekend, and came up with the attatched updates:
> globe_and_mail_national.site - National news
> globe_and_mail_columnists.site - Noted Columnists
> globe_and_mail_toronto.site - Toronto news
> globe_and_mail_thearts.site - Arts and Entertainment
Cool -- I'll check these in. thanks!
> One problem I've noticed is that sometimes the same stories get scooped
> more than once. I think this is because the Story URLs contain a lot of
> parameters and this fools sitescooper into thinking that it hasn't seen
> URLs when it actually has.
> Is there a way around this? I'm thinking of some kind of URL
> transformation hook that would run right before sitescooper ran its cache
> check. Then I could strip out all but the essential story info from the URL.
Yep -- there's a URLProcess directive which is run whenever a new URL
is found. It allows you to rewrite the URLs using perl code. Take a look
at linux/slashdot.site; slashdot has a lot of parameters to view different
levels of comments, so the slashdot site file does some magic to force
all stories to be viewed with the same set of parameters.
> BTW, to the authors of sitescooper: thanks for such a great program!
cheers!
--j.
_______________________________________________
Sitescooper-talk mailing list
[EMAIL PROTECTED]
http://lists.sourceforge.net/lists/listinfo/sitescooper-talk