I have crated individual url-filters to specify exactly what pages I
want in each site, and then I wrote a script to switch in and out the
different filters when I crawl.  That way i'm sure to never go off
site.

On 5/13/05, EM <[EMAIL PROTECTED]> wrote:
> Sounds fine with me although more experience people here may have
> different opinion.
> 
> One small thing, if you are setting up each site individually, then,
> fully disable the spidering. That way, you can inject individual sites
> by yourself.
> 
> Good luck,
> Emilijan
> Ian Reardon wrote:
> 
> >I am going to crawl a small set of sites and I never want to go off
> >site and I also want to strictly control my link dept.
> >
> >I setup crawls for each site using the crawl command.  Then manually
> >move the segments folder to my "master" directory and re-index.  (This
> >can all be scripted).  This gives me the flex ability to QA each
> >individual crawl.
> >
> >Am I jumping through unnecessary hoops here or does this sound like a
> >reasonable plan?
> >
> >
>


-------------------------------------------------------
This SF.Net email is sponsored by Oracle Space Sweepstakes
Want to be the first software developer in space?
Enter now for the Oracle Space Sweepstakes!
http://ads.osdn.com/?ad_ids93&alloc_id281&op=click
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to