Good day,

I'm trying to use Nutch to build a niche search engine, and I would
like to have full control over URLs. I would like to precisely control
which URL get crawled, followed, stored and indexed. Is it possible to
do this as a plug-in? What and where should I be reading to do this?

Coding my side of the logic is trivial, but I have no idea yet how to
interface with Nutch. So far I have just did a basic 'Intranet Crawl'
(with which I had a slight problem which I'll post about later) and
followed that with a command line search using NutchBean.

But I want more control that simply feeding urls/nutch and
conf/crawl-urlfilter.txt

-- 
Fedora 13
(www.pembo13.com)

Reply via email to