Re: [Nutch-general] Motivation for Crawl-urlfilter.txt

Andrzej Bialecki Wed, 04 Oct 2006 16:43:44 -0700

Jared Dunne wrote:
> Can someone explain the motivation for the nutch crawl command to use a
> different file, namely crawl-urlfilter.txt, instead of just using
> regex-urlfilter.txt?
>
> Seems unnecessary, but I'm assuming there a compelling reason.
>


Originally, the 'crawl' command was intended for a simple one-step 
Intranet crawling.  Urlfilters would be very different in such case from 
the case of Internet-wide crawling, so a separate config file seemed 
appropriate.

-- 
Best regards,
Andrzej Bialecki     <><
 ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com



-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys -- and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Re: [Nutch-general] Motivation for Crawl-urlfilter.txt

Reply via email to