Dear Nutch Users,

FYI I've blogged yesterday about an interesting use case of Nutch. We've
helped the guys at SimilarPages to use Nutch on EC2 for a super large crawl
(3 billion docs parsed), which they we've then used with a bit of MapReduce
magic to find similarities between web pages.

I will probably add a Use Case section on the Wiki and write a short
description of the project but in the meantime you can find more details on
http://digitalpebble.blogspot.com/2010/09/similarpages-is-out.html and of
course http://www.similarpages.com/ itself.

Best,

Julien Nioche

-- 
*
*Open Source Solutions for Text Engineering

http://digitalpebble.blogspot.com/
http://www.digitalpebble.com

Reply via email to