here's a site file for Linux Weekly News (lwn.net), specifically the "Weekly Edition" (<http://lwn.net/current>, as linked to on the front page).
note: the scoops can get pretty big (ie 285079 KB) as i scoop the Weekly Edition and any "Full Story" links, but not "Comments" links (but i do get the comments that are at the end of "Full Story" links, which i may work on pruning off with StoryHTMLPreProcess, but there usually aren't any comments so it's not a high priority). if you don't want to follow "Full Story" links, so as to reduce the size of the scoop, then just comment out the StoryFollowLinks directive in the site file, though you'll still follow "Full Story" links that are found on the first page. i tried to point sitescooper to just the first page and have it automagically follow the "Next page:" links (as advertised), but it didn't work and i haven't look at the sitescooper code to see what exactly it was looking for and if LWN's "Next page:" qualified. is there a way to manually specify what it looks for (ie a directive) in following consecutive "pages"? so to follow the "Next page:" links, i had to specify a StoryURL, which the "Full Story" links fit, but not the "Comments" links (but only differ by a trailing forward-slash). just created this two or three weeks ago, and since the weekly edition is only published once a week, i haven't had many editions/samples to test the site file against. heck, i've been too busy to even read the scoops except for the first one, but at least the site file scoops something (which is more than the old site file). anyways... -- PLEASE REQUEST PERMISSION TO REDISTRIBUTE AUTHOR'S COMMENTS OR EMAIL ADDRESS.
# Linux Weekly News sitescooper site file URL: http://lwn.net/current/ Name: Linux Weekly News Levels: 2 ContentsStart: <!-- template MiddleColumn --> ContentsEnd: <!-- Below ends the full table. --> StoryStart: <!-- template MiddleColumn --> StoryEnd: <!-- Below ends the full table. --> StoryURL: http://lwn.net/Articles/\d+/ StoryHeadline: <tr><td class="Headline"><div class="C2HL"><b>(.*?)</b></div> StoryFollowLinks: 1
