Dear nutch-user readers,
I have a question for everyone here: Is the current Nutch crawler
(Fetcher/Fetcher2) flexible enough for your needs?
If not, what would you like to see it do?
I'm asking because, last week, I suggested that the Nutch crawler
could be much more useful to many people if it was structured more as
a "crawler construction toolkit". But I realize that my comments
could seem like sour grapes unless there's some plan for moving
forward. So, I thought I'd just ask everybody what you think and
tally the results.
What kind of crawls would you like to do that aren't supported? I'll
start with some nonstandard crawls I've done:
1) Outlinks-only crawl: crawl a specific website, keep only the
outlinks from articles (, etc)
2) Crawl into CGIs w/o infinite crawl -- via crawl-depth filter
3) Plug in a "feature detector" (address, date, brand-name, etc) and
use this signal to guide the crawl
4) .... (fill in your own here!)
--
Matt Kangas / [EMAIL PROTECTED]