Bill Janssen said:

> > I'm having terrible luck writing site files that use ContentStart/End
> > patterns that use any HTML constructs, like "<B>Contents</B>".
> > Sitescooper just doesn't find the pattern, even though when I download
> > the page, it's there all right.  Anyone know why?
> 
> Just as a follow-up, when I use "#" in the pattern, it seems to
> truncate just before that character (and also fail).

Yeah, # is the comment char for sites.  This is a bit of an FAQ so I'll
add it to the docs.

BTW there should be no other problems with the ContentsStart/End patterns
etc., however -- HTML tags are just matched as pure text.  You could try
using the "-admin journal" flag to make a journal file as it's scooping in
your ~/.sitescooper directory -- this file will contain every bit of text
encountered during the scooping process, so you can edit it and see
exactly what sitescooper encountered, and what text was stripped (if any).

--j.
_______________________________________________
Sitescooper-talk mailing list
[EMAIL PROTECTED]
http://lists.sourceforge.net/mailman/listinfo/sitescooper-talk

Reply via email to