Hello,

I'm using Nutch's Fetcher for my Simpy.com project, and one of the
things I'd like to do is detect broken links (any type of error - wrong
host name, 404, 500, 302, etc.).  From what I can tell, only successful
fetches  (200s and maybe 301/302s that result in 200) end up being
written to disk, while all other links don't get stored anywhere (ASF
SVN is down right now, can't double-check this).

What's the best place to plug in some code to grab the broken links as
their fetches are failing?  I looked at Fetcher.java this morning and
saw handleFetch and handleNoFetch methods.  Is this the best place to
add code for my purposes?  I'm not too familiar with Nutch's plugin
system, but can I write a plugin that plugs into those 2 methods?

Or is there a ways to give Nutch a URL and get its HTTP status response
code back after fetching, merging, indexing, and optimizing is done?

Thanks,
Otis

____________________________________________________________________
Simpy -- simpy.com -- tags, social bookmarks, personal search engine

Reply via email to