Re: [ANNOUNCE] Web Crawler

Dominique Bejean Wed, 02 Mar 2011 03:00:17 -0800

David,

The UI was not the only reason that make me choose to write a totaly newcrawler. After eliminating candidate crawlers due to various reasons(inactive project, ...), Nutch and Heritrix where the 2 crawlers in myshort list of possible candidates to be use.

In my mind, the crawler and the pipleline have to be tottalydisconnected of the target repository (Solr, ...). This made nutch not apossible choice.At the end, I found Heritrix to far of the solution's architecture Iimagined.


Dominique


Le 02/03/11 05:41, David Smiley (@MITRE.org) a écrit :

Dominique,
The obvious number one question is of course why you re-invented this wheel
when there are several existing crawlers to choose from.  Your website says
the reason is that the UIs on existing crawlers (e.g. Nutch, Heritrix, ...)
weren't sufficiently user-friendly or had the site-specific configuration
you wanted.  Well if that is the case, why didn't you add/enhance such
capabilities for an existing crawler?

~ David Smiley

-----
  Author: https://www.packtpub.com/solr-1-4-enterprise-search-server/book

Re: [ANNOUNCE] Web Crawler

Reply via email to