"McIntosh, Jim" said:
> Thanks for all the responses to my request , I don't seem to have resolved
> the problem though, I think I must be misunderstanding something fundamental
> about sitescooper's operation--
> I tried the following site file (guardfull.site)--->
>
> URL: http://www.guardian.co.uk/guardian/todays_stories
> Name: guardianfull
> Levels: 3
> ContentsURL : http://www.guardian.co.uk/.*/story/.*\.(htm|html)
> StoryURL: http://www.guardian.co.uk/Print/.*\.(html|htm)
Aha! Sorry, this one's *really* simple, I should have spotted it straight
away. ;)
Basically -- there's an extra space character between "ContentsURL" and
the colon after it:
> ContentsURL : http://www.guardian.co.uk/.*/story/.*\.(htm|html)
^
This is confusing sitescooper, causing it to use the default pattern for
ContentsURL -- ie. "anything on the site", which isn't what you want.
If you use
ContentsURL: http://www.guardian.co.uk/.*/story/.*\.(htm|html)
it works perfectly.
I'd better modify the sitescooper site file parser to catch this...
--j.
_______________________________________________
Sitescooper-talk mailing list
[EMAIL PROTECTED]
http://lists.sourceforge.net/lists/listinfo/sitescooper-talk