Dear nutch-user readers,

I have a question for everyone here: Is the current Nutch crawler (Fetcher/Fetcher2) flexible enough for your needs?
If not, what would you like to see it do?

I'm asking because, last week, I suggested that the Nutch crawler could be much more useful to many people if it was structured more as a "crawler construction toolkit". But I realize that my comments could seem like sour grapes unless there's some plan for moving forward. So, I thought I'd just ask everybody what you think and tally the results.

What kind of crawls would you like to do that aren't supported? I'll start with some nonstandard crawls I've done:

1) Outlinks-only crawl: crawl a specific website, keep only the outlinks from articles (, etc)
2) Crawl into CGIs w/o infinite crawl -- via crawl-depth filter
3) Plug in a "feature detector" (address, date, brand-name, etc) and use this signal to guide the crawl

4) .... (fill in your own here!)

--
Matt Kangas / [EMAIL PROTECTED]


Reply via email to