I've attached two .site files.

washington_post.site scoops the contents of *all* sections of the Print
Edition of the Washington Post, as they call it on the Web site.  (There
aren't any images in the Print Edition.)

washington_post_natpol.site scoops the Nation and Politics section of the
Print Edition of the Washington Post.  More importantly, it contains
instructions on how to use it as a template to scoop any single section you
want.  (Honestly, I didn't have the energy to create a .site file for evey
singel section!)

Hope these are useful.

Regards,
Tim Kynerd

Sunrise in Stockholm today:  7:30
Sunset in Stockholm today:  15:33
My rail transit photos at http://www.kynerd.nu
# Washington Post Nation and Politics
# Author:  Tim Kynerd, 9 November 2001 -- [EMAIL PROTECTED]
# This is a two-level site:  the top level consists of the contents pages
# for the section; the second level is the stories.
# This example site fetches the Nation and Politics section of the Post.
# It can be easily copied and modified, using the instructions below, to
# scoop other sections.

URL: http://www.washingtonpost.com/wp-dyn/print/nationpolitics/
Name: Washington Post Nation and Politics
Description: National News from the Washington Post
Levels: 2
ContentsStart: -- date --
ContentsEnd: -- end print/body --
StoryURL: .*/wp-dyn/articles/.*html

# Below are URL lines for all sections of the Post.
# To scoop any section of the Post, simply substitute the appropriate line
# (without the # at the beginning, which marks a comment) for the URL line
# above (the first uncommented line above).
# You'll also need to change the Name and Description lines to correspond to
# the section you're scooping, as well as the name of the .site file itself.
#
# Front Page
# URL: http://www.washingtonpost.com/wp-dyn/print/a1/
#
# Inside the A Section
# URL: http://www.washingtonpost.com/wp-dyn/print/asection/
#
# Nation and Politics
# URL: http://www.washingtonpost.com/wp-dyn/print/nationpolitics/
#
# Editorial Pages
# URL: http://www.washingtonpost.com/wp-dyn/print/editorials/
#
# World
# URL: http://www.washingtonpost.com/wp-dyn/print/world/
#
# Business
# URL: http://www.washingtonpost.com/wp-dyn/print/business/
#
# Metro
# URL: http://www.washingtonpost.com/wp-dyn/print/metro/
#
# Sports
# URL: http://www.washingtonpost.com/wp-dyn/print/sports/
#
# Style
# URL: http://www.washingtonpost.com/wp-dyn/print/style/
# Washington Post
# Author:  Tim Kynerd, 2 November 2001 -- [EMAIL PROTECTED]
# This is a three-level site:  the top level is the list of sections of the
# paper; the second level consists of the contents pages for the sections;
# the third level are the stories.
# On a typical weekday, with all sections included, this will fetch about 2 MB 
# of news.  In the Plucker format, this ends up as a database about 600-650 KB 
# in size.

URL: http://www.washingtonpost.com/wp-dyn/print/
Name: Washington Post
Description: News from the Washington Post
Levels: 3
IssueUseTableSmarts: 0
IssueLinksStart: -- End Top --
IssueLinksEnd: /print/archive
ContentsURL: .*/wp-dyn/print/.*
ContentsStart: -- date --
ContentsEnd: -- end print/body --
StoryURL: .*/wp-dyn/articles/.*html
# Below are ContentsSkipURL lines for all sections of the Post.
# Uncomment any of these to skip the sections indicated.
# Note:  Uncomment only the "ContentsSkipURL" lines, NOT the names of the
# sections!  See "Front Page Image" for an example.
#
# Front Page
# ContentsSkipURL: .*/wp-dyn/print/a1.*
# 
# Front Page Image (no stories here, skipped by default)
ContentsSkipURL: .*/wp-dyn/print/image.*
#
# Inside the A Section
# ContentsSkipURL: .*/wp-dyn/print/asection.*
#
# Nation and Politics
# ContentsSkipURL: .*/wp-dyn/print/nationpolitics.*
#
# Editorial Pages
# ContentsSkipURL: .*/wp-dyn/print/editorials.*
#
# World
# ContentsSkipURL: .*/wp-dyn/print/world.*
#
# Business
# ContentsSkipURL: .*/wp-dyn/print/business.*
#
# Metro
# ContentsSkipURL: .*/wp-dyn/print/metro.*
#
# Sports
# ContentsSkipURL: .*/wp-dyn/print/sports.*
#
# Style
# ContentsSkipURL: .*/wp-dyn/print/style.*

Reply via email to