"Larry W. Virden" said:

> I have been telling sitescooper just to look at the sample profiles and
> sites because I don't really understand these two configuration parms.
> However, it appears that sitescooper is doing a LOT more work than it 
> should because of this - it appears to be wandering around the internet
> fetching all sorts of stuff.
> 
> Can someone (off line would be fine) help me understand better how best
> to make use these two configuration parms ?

Hi Larry -- 

I think you've set the SitesDir: configuration parameter to point at the
site_samples tree.  Probably better not to do that, as sitescooper will
scoop every one of the 500-odd sites ;)

cf: http://sitescooper.org/doc/running.html#sitesdir

As that text says, SitesDir is present for backwards-compatibility, and is
deprecated.


Here's details of how the more "modern" way to do it, works. There's
(basically) three ways to run sitescooper:

  sitescooper -sites foo.site [...]

    to pick the sites you want to scoop, on the command line itself; handy
    for picking up stuff from a specific site. cf.
    http://sitescooper.org/doc/running.html#sitesoncmdline

  sitescooper http://www.some.url.com/foo.html

    to scoop a page, quickly. cf.
    http://sitescooper.org/doc/running.html#scoopaurl

  sitescooper

    where sitescooper will use your site_choices.txt file to decide which
    sites to scoop. cf. http://sitescooper.org/doc/running.html#thecommand

The latter is what is recommended for typical, daily use.  Basically, you
want to edit the file ~/.sitescooper/site_choices.txt ; it lists your
"default" set of sites to scoop from, used when no sites are listed on the
cmd line.

The format of the file is:

    [x] Name Of Site
        URL: http://url.of.site/
        Filename: [samples]/site/file_location.site

for example:

    [x] Linux Weekly News
        URL: http://www.lwn.net/archives/
        Filename: [samples]/linux/weekly_news.site

If there's an "x" in the box beside the name, the site is scooped;
otherwise, of there's just an empty space, it will not be scooped.  I'd
guess your site_choices.txt file has several "x"'s in there.

You should comment out the "SitesDir" line in your configuration as well,
otherwise sitescooper will still go off and download 1000's of pages!

HTH,

--j.

_______________________________________________
Sitescooper-talk mailing list
[EMAIL PROTECTED]
http://lists.sourceforge.net/lists/listinfo/sitescooper-talk

Reply via email to