David,

The UI was not the only reason that make me choose to write a totaly new crawler. After eliminating candidate crawlers due to various reasons (inactive project, ...), Nutch and Heritrix where the 2 crawlers in my short list of possible candidates to be use.

In my mind, the crawler and the pipleline have to be tottaly disconnected of the target repository (Solr, ...). This made nutch not a possible choice. At the end, I found Heritrix to far of the solution's architecture I imagined.

Dominique


Le 02/03/11 05:41, David Smiley (@MITRE.org) a écrit :
Dominique,
The obvious number one question is of course why you re-invented this wheel
when there are several existing crawlers to choose from.  Your website says
the reason is that the UIs on existing crawlers (e.g. Nutch, Heritrix, ...)
weren't sufficiently user-friendly or had the site-specific configuration
you wanted.  Well if that is the case, why didn't you add/enhance such
capabilities for an existing crawler?

~ David Smiley

-----
  Author: https://www.packtpub.com/solr-1-4-enterprise-search-server/book

Reply via email to