Dear Nutch Users, FYI I've blogged yesterday about an interesting use case of Nutch. We've helped the guys at SimilarPages to use Nutch on EC2 for a super large crawl (3 billion docs parsed), which they we've then used with a bit of MapReduce magic to find similarities between web pages.
I will probably add a Use Case section on the Wiki and write a short description of the project but in the meantime you can find more details on http://digitalpebble.blogspot.com/2010/09/similarpages-is-out.html and of course http://www.similarpages.com/ itself. Best, Julien Nioche -- * *Open Source Solutions for Text Engineering http://digitalpebble.blogspot.com/ http://www.digitalpebble.com

