Hi James, On Fri, Feb 24, 2012 at 2:47 PM, Spadez <[email protected]> wrote:
> However, having found nutch, it seems like this might be something worth > looking at. Firstly, is nutch simply a web scrapper or does it integrate > other aspects of lucene as well? Im wondering if I would need to install > Nutch and SOLR together, or if Nutch integrates the search system as well. > You need to set up Nutch for crawling the web/filesystem of choice, then the process of communicating with Solr is trivial. Please see this tutorial for a comprehensive walkthrough [0] > > Secondly, how does Nutch compare with a home brew PHP scraper. Haven't got a clue as I haven't seen or used any home brew scrapers. > Im really out > of my depth here, am I looking at a tool that is extremely powerful and > ready for a production environment, yes it is a very well established, actively maintained web crawler with a healthy user and development community. It is also a mature project within the Apache Software Foundation. > or is it still very much a development > project on the side? > No not at all. Nutch excels at covering the tasks you require. Thanks Lewis I [0] http://wiki.apache.org/nutch/NutchTutorial

