Focussed Web Crawling with Nutch

2009-07-31 Thread Alex McLintock
I've been using a perl based focussed web crawler with a MySQL back end, but am now looking at Nutch instead. It seems like a few other people have done something similar. I'm wondering whether we could pool our resources and work together on this? It seems to me that we would be building a few

Re: Focussed Web Crawling with Nutch

2009-07-31 Thread Ken Krugler
Hi Alex, There has been discussion on focused web crawling using Nutch in the past, so you probably want to check the archives. Key aspect is using the scoring plugin API to rate pages (and outlinks from pages), which then can be used to do a more efficient job of fetching pages that are

Re: Focussed Web Crawling with Nutch

2009-07-31 Thread MilleBii
I do something like this... I update the URL scores based on my own algorithm which works on parse data. Works great. 2009/7/31 Ken Krugler kkrugler_li...@transpac.com Hi Alex, There has been discussion on focused web crawling using Nutch in the past, so you probably want to check the