[EMAIL PROTECTED] said:
> Whew! After fixing the iSilo on Win2K debacle I have been rolling using > sitescooper, it's awesome so far. cool! > But, I hit a bit of a snag. There is a > site located at www.tampatrib.com that I am having trouble with. > Specifically, the site is www.tampatrib.com/floridametronews/*.html where > the * indicates the day of the week. The index page is > http://www.tampatrib.com/FloridaMetro/index.shtml which lists only a few > stories with far more comprehensive coverage under the "index of Saturday > Metro Stories" (or Sunday, or Tuesday, etc.) page which would be designated > as http://www.tampatrib.com/floridametronews/saturday.htm. man, that is pathological. this is very tricky. There's two ways to do it. 1 -- the easiest. Just scoop the top page, and cut out *everything* except the link to the day page using IssueLinksStart and IssueLinksEnd. alternatively, 2 -- harder. Use a "fake" URL for the top page, and use a URLProcess: command to replace the fake bits with the real day name. This works: URL: http://www.tampatrib.com/floridametronews/__CURRENTDAY__.htm Name: Florida Metro News URLProcess: { # get today's full name my @time = localtime(time); my @daynames = qw(sunday monday tuesday wednesday thursday friday saturday); my $todayname = $daynames[$time[6]]; # and replace the magic token: s/__CURRENTDAY__/$todayname/; } --j. _______________________________________________ Sitescooper-talk mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/sitescooper-talk
