On Tue, 30 Oct 2001, Barry Dexter A. Gonzaga wrote: > goodday! > > On Mon, Oct 29, 2001 at 05:43:50PM +0100, Tim Kynerd wrote: > > The first problem, with the Web site, remains AFAIK; I haven't been on the > > Web site since Saturday, so I can't say for sure. > > confirmed, anything inside /doc/ gets permissions denied.
OK, then I wasn't insane! ;-) > > > The second problem turned out to be initially a problem with my site file -- > > my regexps weren't getting matched (duh). But even after I fixed them, I > > wasn't picking up as much content as I would like. I *think* this is > > because the links I want scooped are in a table. Will setting > > "ContentsUseTableSmarts" to 0 solve this? For the time being, I solved it > > Try Setting "ContentsUseTableSmarts" to 0, as you stated, or use > ContentsStart/ContentsEnd. I *think* that IssueLinksStart/IssueLinksEnd is what I needed. Even though the Washington Post site doesn't have different "issues," what I'm trying to do is scoop the various *sections* of the paper, and each of those functions like an issue, being essentially a table of contents with links to stories. When I used IssueLinksStart/IssueLinksEnd, I got nothing because the only thing between those strings was -- yep -- a table containing the sections I wanted to scoop; I think sitescooper didn't pick them up because it ignored the table -- so I'm hoping ContentsUseTableSmarts: 0 will work. I'll test this as soon as I get a chance. > > > by copying the HTML page to my hard drive and editing it, then scooping that > > kinda defeats sitescooping, does'nt it? ;) Kinda ;-), but not as badly as you'd think. The "section" links (see my explanation above) are static links, with URLs that never change, so I just have those in the HTML page on my hard drive and let it function as the top level of a 3-level site. This also has the advantage of letting me format that initial page any way I want, rather than being, to some extent, "stuck" with what the designers of the Washington Post's Web site came up with. But, as I say, it restricts the general applicability of the .site file :-(. > > > file; this works beautifully. But I was planning to contribute this site > > file once I got it working, and I suppose I can't do so as long as it's > > dependent on another file for the scooping to work -- or? > > > you could also look at site_samples dir and use them as "guides" > or do what i do copy from them ;). I did a bit of this while developing the .site file, but I'll do some more looking to see what I can find. Thanks, Barry. -- Tim Kynerd Sunrise in Stockholm today: 7:02 Sunset in Stockholm today: 16:00 My rail transit photos at http://www.kynerd.nu _______________________________________________ Sitescooper-talk mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/sitescooper-talk
