I've been using a perl based focussed web crawler with a MySQL back
end, but am now looking at Nutch instead. It seems like a few other
people have done something similar. I'm wondering whether we could
pool our resources and work together on this?
It seems to me that we would be building a few
Hi Alex,
There has been discussion on focused web crawling using Nutch in the
past, so you probably want to check the archives.
Key aspect is using the scoring plugin API to rate pages (and outlinks
from pages), which then can be used to do a more efficient job of
fetching pages that are
I do something like this... I update the URL scores based on my own
algorithm which works on parse data.
Works great.
2009/7/31 Ken Krugler kkrugler_li...@transpac.com
Hi Alex,
There has been discussion on focused web crawling using Nutch in the past,
so you probably want to check the