Hi James,

On Fri, Feb 24, 2012 at 2:47 PM, Spadez <[email protected]> wrote:

> However, having found nutch, it seems like this might be something worth
> looking at. Firstly, is nutch simply a web scrapper or does it integrate
> other aspects of lucene as well? Im wondering if I would need to install
> Nutch and SOLR together, or if Nutch integrates the search system as well.
>

You need to set up Nutch for crawling the web/filesystem of choice, then
the process of communicating with Solr is trivial. Please see this tutorial
for a comprehensive walkthrough [0]

>
> Secondly, how does Nutch compare with a home brew PHP scraper.

Haven't got a clue as I haven't seen or used any home brew scrapers.


> Im really out
> of my depth here, am I looking at a tool that is extremely powerful and
> ready for a production environment,

yes it is a very well established, actively maintained web crawler with a
healthy user and development community. It is also a mature project within
the Apache Software Foundation.


> or is it still very much a development
> project on the side?
>

No not at all. Nutch excels at covering the tasks you require.

Thanks

Lewis

I
[0] http://wiki.apache.org/nutch/NutchTutorial

Reply via email to