Good day, I'm trying to use Nutch to build a niche search engine, and I would like to have full control over URLs. I would like to precisely control which URL get crawled, followed, stored and indexed. Is it possible to do this as a plug-in? What and where should I be reading to do this?
Coding my side of the logic is trivial, but I have no idea yet how to interface with Nutch. So far I have just did a basic 'Intranet Crawl' (with which I had a slight problem which I'll post about later) and followed that with a command line search using NutchBean. But I want more control that simply feeding urls/nutch and conf/crawl-urlfilter.txt -- Fedora 13 (www.pembo13.com)

