Hello, I'm using Nutch's Fetcher for my Simpy.com project, and one of the things I'd like to do is detect broken links (any type of error - wrong host name, 404, 500, 302, etc.). From what I can tell, only successful fetches (200s and maybe 301/302s that result in 200) end up being written to disk, while all other links don't get stored anywhere (ASF SVN is down right now, can't double-check this).
What's the best place to plug in some code to grab the broken links as their fetches are failing? I looked at Fetcher.java this morning and saw handleFetch and handleNoFetch methods. Is this the best place to add code for my purposes? I'm not too familiar with Nutch's plugin system, but can I write a plugin that plugs into those 2 methods? Or is there a ways to give Nutch a URL and get its HTTP status response code back after fetching, merging, indexing, and optimizing is done? Thanks, Otis ____________________________________________________________________ Simpy -- simpy.com -- tags, social bookmarks, personal search engine
