Plug-in for complete user control

Arthur Pemberton Tue, 10 Aug 2010 04:33:17 -0700

Good day,

I'm trying to use Nutch to build a niche search engine, and I would
like to have full control over URLs. I would like to precisely control
which URL get crawled, followed, stored and indexed. Is it possible to
do this as a plug-in? What and where should I be reading to do this?


Coding my side of the logic is trivial, but I have no idea yet how to
interface with Nutch. So far I have just did a basic 'Intranet Crawl'
(with which I had a slight problem which I'll post about later) and
followed that with a command line search using NutchBean.

But I want more control that simply feeding urls/nutch and
conf/crawl-urlfilter.txt

-- 
Fedora 13
(www.pembo13.com)

Plug-in for complete user control

Reply via email to