hi,

after many experiments I have not figured out, how to scoop from a site of
my interest.

the base url is: www.pro-linux.de

on that page there is only one link of interest:
www.pro-linux.de/news/old/index.html
I want to start from there.

this page has links to content urls named by a short form for month. they
are grouped
in directories ordered by years. examples are:
http://www.pro-linux.de/news/old/2001/Apr.html
or http://www.pro-linux.de/news/old/2000/Dez.html

the page www.pro-linux.de/news/old/index.html is growing every month by one
link (the new month).

like on the current page for may
(http://www.pro-linux.de/news/old/2001/Mai.html) there are links
to the stories, for example: http://www.pro-linux.de/news/2001/2999.html

the monthly pages grow every day. 

my idea is now to have a site file for a 3 levels, starting with
www.pro-linux.de/news/old/index.html
get the issue pages like http://www.pro-linux.de/news/old/2001/Mai.html but
the next month
http://www.pro-linux.de/news/old/2001/Jun.html too and so on and finally the
story pages linked form the
issue pages.

I created the following site file:

# This is a sitescooper site file. see http://sitescooper.tsx.org/
# by Michael Tepperis-von der Ohe, Version 0.1, 20.04.2001
URL: http://www.pro-linux.de/news/old/index.html
  Name: pro-linux
  Levels: 3
  IssueFollowLinks: 1
  IssueURL: http://www.pro-linux.de/news/old/index.html
  IssueURL: http://www.pro-linux.de/news/old/2001/Apr\.html
  IssueURL: http://www.pro-linux.de/news/old/2001/Mai\.html
  StoryURL: http://www.pro-linux.de/news/2001/\d+\.html             

good would be something like that:
  IssueURL:
http://www.pro-linux.de/news/old/\d+/(Jan|Feb|M.r|Apr|Mai|Jun|Jul|Aug|Sep|Ok
t|Nov|Dez)\.html
but that is optimized for advanced the basics don't work :(

greetings

 Michael 

----------------------------------------------------------------------------
Michael Tepperis-von der Ohe                        Logica-pdv
email: [EMAIL PROTECTED]                              www.logica-pdv.de

_______________________________________________
Sitescooper-talk mailing list
[EMAIL PROTECTED]
http://lists.sourceforge.net/lists/listinfo/sitescooper-talk

Reply via email to