[EMAIL PROTECTED] said:

> Whew!  After fixing the iSilo on Win2K debacle I have been rolling using 
> sitescooper, it's awesome so far.

cool!

> But, I hit a bit of a snag.  There is a 
> site located at www.tampatrib.com that I am having trouble with.  
> Specifically, the site is www.tampatrib.com/floridametronews/*.html where 
> the * indicates the day of the week.  The index page is 
> http://www.tampatrib.com/FloridaMetro/index.shtml which lists only a few 
> stories with far more comprehensive coverage under the "index of Saturday 
> Metro Stories" (or Sunday, or Tuesday, etc.) page which would be designated 
> as http://www.tampatrib.com/floridametronews/saturday.htm.

man, that is pathological.  this is very tricky.  There's two ways to do
it.

1 -- the easiest.  Just scoop the top page, and cut out *everything*
except the link to the day page using IssueLinksStart and IssueLinksEnd.

alternatively, 2 -- harder.  Use a "fake" URL for the top page, and
use a URLProcess: command to replace the fake bits with the real
day name. This works:

  URL: http://www.tampatrib.com/floridametronews/__CURRENTDAY__.htm
  Name: Florida Metro News

  URLProcess: {
    # get today's full name
    my @time = localtime(time);
    my @daynames = qw(sunday monday tuesday wednesday thursday friday saturday);
    my $todayname = $daynames[$time[6]];
    # and replace the magic token:
    s/__CURRENTDAY__/$todayname/;
  }

--j.

_______________________________________________
Sitescooper-talk mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/sitescooper-talk

Reply via email to