(sorry about the delay -- a move to Oz left me with intermittent net access
for a while there ;)

Michael Graham said:

> I spent a little time this weekend, and came up with the attatched updates:
>     globe_and_mail_national.site    - National news
>     globe_and_mail_columnists.site  - Noted Columnists
>     globe_and_mail_toronto.site     - Toronto news
>     globe_and_mail_thearts.site     - Arts and Entertainment

Cool -- I'll check these in. thanks!

> One problem I've noticed is that sometimes the same stories get scooped
> more than once.  I think this is because the Story URLs contain a lot of
> parameters and this fools sitescooper into thinking that it hasn't seen
> URLs when it actually has.
> Is there a way around this?  I'm thinking of some kind of URL
> transformation hook that would run right before sitescooper ran its cache
> check.  Then I could strip out all but the essential story info from the URL.

Yep -- there's a URLProcess directive which is run whenever a new URL
is found.  It allows you to rewrite the URLs using perl code.  Take a look
at linux/slashdot.site; slashdot has a lot of parameters to view different
levels of comments, so the slashdot site file does some magic to force
all stories to be viewed with the same set of parameters.

> BTW, to the authors of sitescooper: thanks for such a great program!

cheers!

--j.

_______________________________________________
Sitescooper-talk mailing list
[EMAIL PROTECTED]
http://lists.sourceforge.net/lists/listinfo/sitescooper-talk

Reply via email to