Dear List:

I have been playing with some of the workarounds posted to this list by
Kennis Koldewyn (thanks!) to get the New York Times to cooperate with
sitescooper.

One problem: the level2 URLS are processed but the long stories are
split into multiple pages.  Attempts to collect the sub urls using
[Level2FollowLinks: 1] tag throw you back into the login screen.  I have
found that the NYT server accepts level2 story urls with the magic
"?pagewanted=all" tag even if the story does not have following pages. 
Is it possible to preprocess the level 2 scooped urls to append
"?pagewanted=all" before the level2 urls are collected?  I do not know
perl but the UrlProcess tag looks promising.  Any thoughts?

Regards,
Bennett Feitell

by list and mail

_______________________________________________
Sitescooper-talk mailing list
[EMAIL PROTECTED]
http://lists.sourceforge.net/lists/listinfo/sitescooper-talk

Reply via email to