I've attached two .site files. washington_post.site scoops the contents of *all* sections of the Print Edition of the Washington Post, as they call it on the Web site. (There aren't any images in the Print Edition.)
washington_post_natpol.site scoops the Nation and Politics section of the Print Edition of the Washington Post. More importantly, it contains instructions on how to use it as a template to scoop any single section you want. (Honestly, I didn't have the energy to create a .site file for evey singel section!) Hope these are useful. Regards, Tim Kynerd Sunrise in Stockholm today: 7:30 Sunset in Stockholm today: 15:33 My rail transit photos at http://www.kynerd.nu
# Washington Post Nation and Politics # Author: Tim Kynerd, 9 November 2001 -- [EMAIL PROTECTED] # This is a two-level site: the top level consists of the contents pages # for the section; the second level is the stories. # This example site fetches the Nation and Politics section of the Post. # It can be easily copied and modified, using the instructions below, to # scoop other sections. URL: http://www.washingtonpost.com/wp-dyn/print/nationpolitics/ Name: Washington Post Nation and Politics Description: National News from the Washington Post Levels: 2 ContentsStart: -- date -- ContentsEnd: -- end print/body -- StoryURL: .*/wp-dyn/articles/.*html # Below are URL lines for all sections of the Post. # To scoop any section of the Post, simply substitute the appropriate line # (without the # at the beginning, which marks a comment) for the URL line # above (the first uncommented line above). # You'll also need to change the Name and Description lines to correspond to # the section you're scooping, as well as the name of the .site file itself. # # Front Page # URL: http://www.washingtonpost.com/wp-dyn/print/a1/ # # Inside the A Section # URL: http://www.washingtonpost.com/wp-dyn/print/asection/ # # Nation and Politics # URL: http://www.washingtonpost.com/wp-dyn/print/nationpolitics/ # # Editorial Pages # URL: http://www.washingtonpost.com/wp-dyn/print/editorials/ # # World # URL: http://www.washingtonpost.com/wp-dyn/print/world/ # # Business # URL: http://www.washingtonpost.com/wp-dyn/print/business/ # # Metro # URL: http://www.washingtonpost.com/wp-dyn/print/metro/ # # Sports # URL: http://www.washingtonpost.com/wp-dyn/print/sports/ # # Style # URL: http://www.washingtonpost.com/wp-dyn/print/style/
# Washington Post # Author: Tim Kynerd, 2 November 2001 -- [EMAIL PROTECTED] # This is a three-level site: the top level is the list of sections of the # paper; the second level consists of the contents pages for the sections; # the third level are the stories. # On a typical weekday, with all sections included, this will fetch about 2 MB # of news. In the Plucker format, this ends up as a database about 600-650 KB # in size. URL: http://www.washingtonpost.com/wp-dyn/print/ Name: Washington Post Description: News from the Washington Post Levels: 3 IssueUseTableSmarts: 0 IssueLinksStart: -- End Top -- IssueLinksEnd: /print/archive ContentsURL: .*/wp-dyn/print/.* ContentsStart: -- date -- ContentsEnd: -- end print/body -- StoryURL: .*/wp-dyn/articles/.*html # Below are ContentsSkipURL lines for all sections of the Post. # Uncomment any of these to skip the sections indicated. # Note: Uncomment only the "ContentsSkipURL" lines, NOT the names of the # sections! See "Front Page Image" for an example. # # Front Page # ContentsSkipURL: .*/wp-dyn/print/a1.* # # Front Page Image (no stories here, skipped by default) ContentsSkipURL: .*/wp-dyn/print/image.* # # Inside the A Section # ContentsSkipURL: .*/wp-dyn/print/asection.* # # Nation and Politics # ContentsSkipURL: .*/wp-dyn/print/nationpolitics.* # # Editorial Pages # ContentsSkipURL: .*/wp-dyn/print/editorials.* # # World # ContentsSkipURL: .*/wp-dyn/print/world.* # # Business # ContentsSkipURL: .*/wp-dyn/print/business.* # # Metro # ContentsSkipURL: .*/wp-dyn/print/metro.* # # Sports # ContentsSkipURL: .*/wp-dyn/print/sports.* # # Style # ContentsSkipURL: .*/wp-dyn/print/style.*
