Re: [scoop] sitescooper.org problems, and -fullrefresh problems

Tim Kynerd Tue, 30 Oct 2001 06:13:06 -0800

On Tue, 30 Oct 2001, Barry Dexter A. Gonzaga wrote:

> goodday!
>
> On Mon, Oct 29, 2001 at 05:43:50PM +0100, Tim Kynerd wrote:
> > The first problem, with the Web site, remains AFAIK; I haven't been on the
> > Web site since Saturday, so I can't say for sure.
>
>       confirmed, anything inside /doc/ gets permissions denied.


OK, then I wasn't insane! ;-)

>
> > The second problem turned out to be initially a problem with my site file --
> > my regexps weren't getting matched (duh).  But even after I fixed them, I
> > wasn't picking up as much content as I would like.  I *think* this is
> > because the links I want scooped are in a table.  Will setting
> > "ContentsUseTableSmarts" to 0 solve this?  For the time being, I solved it
>
>       Try Setting "ContentsUseTableSmarts" to 0, as you stated, or use
>       ContentsStart/ContentsEnd.

I *think* that IssueLinksStart/IssueLinksEnd is what I needed.  Even though
the Washington Post site doesn't have different "issues," what I'm trying to
do is scoop the various *sections* of the paper, and each of those functions
like an issue, being essentially a table of contents with links to stories.

When I used IssueLinksStart/IssueLinksEnd, I got nothing because the only
thing between those strings was -- yep -- a table containing the sections I
wanted to scoop; I think sitescooper didn't pick them up because it ignored
the table -- so I'm hoping ContentsUseTableSmarts: 0 will work.  I'll test
this as soon as I get a chance.

>
> > by copying the HTML page to my hard drive and editing it, then scooping that
>
>       kinda defeats sitescooping, does'nt it? ;)

Kinda ;-), but not as badly as you'd think.  The "section" links (see my
explanation above) are static links, with URLs that never change, so I just
have those in the HTML page on my hard drive and let it function as the top
level of a 3-level site.

This also has the advantage of letting me format that initial page any way I
want, rather than being, to some extent, "stuck" with what the designers of
the Washington Post's Web site came up with.

But, as I say, it restricts the general applicability of the .site file :-(.

>
> > file; this works beautifully.  But I was planning to contribute this site
> > file once I got it working, and I suppose I can't do so as long as it's
> > dependent on another file for the scooping to work -- or?
>
>
>       you could also look at site_samples dir and use them as "guides"
> or do what i do copy from them ;).

I did a bit of this while developing the .site file, but I'll do some more
looking to see what I can find.

Thanks, Barry.

-- Tim Kynerd

Sunrise in Stockholm today:  7:02
Sunset in Stockholm today:  16:00
My rail transit photos at http://www.kynerd.nu


_______________________________________________
Sitescooper-talk mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/sitescooper-talk

Re: [scoop] sitescooper.org problems, and -fullrefresh problems

Reply via email to