Otis:

There are many reasons I prefer Solr to Nutch:

1. I actually tried to do some of the crawling with Nutch, but found the crawling options less flexible than I would have liked. 2. I prefer the Solr approach in general. I have a long background in Verity and Autonomy search, and Solr is a bit closer to them than Nutch.
3. I really like the schema support in Solr.
4. I really really like the facets/parametric search in Solr.
5. I really really really like the REST interface in Solr.
6. Finally, and not to put too fine a point on it, hadoop frightens the bejeebers out of me. I've skimmed some of the papers and it looks like a lot of study before I will fully understand it. I'm not saying I'm stupid and lazy, but if the map-reduce algorithm fits, I'll wear it. Plus, I'm trying to get a mental handle on Jeff Hawkins' HTM and it's application to the real world. It all makes my cerebral cortex itchy.

Thanks for the suggestion, though. I'll probably revisit Nutch again if Heritrix lets me down. I had no luck getting the Nutch crawler Solr patch to work, either. Sadly, I'm the David Lee Roth of Java programmers - I may think that I"m hard-core, but I'm not, really. And my groupies are getting a bit saggy.

BTW - add my voice to the paeans of praise for Lucene in Action. You and Erik did a bang up job, and I surely appreciate all the feedback you give on this forum, Especially over the past few months as I feel my way through Solr and Lucene.



On Nov 22, 2007, at 10:10 PM, Otis Gospodnetic wrote:

The answer to that question, Norberto, would depend on versions.

George: why not just use straight Nutch and forget about Heritrix?

Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch

----- Original Message ----
From: Norberto Meijome <[EMAIL PROTECTED]>
To: solr-user@lucene.apache.org
Cc: [EMAIL PROTECTED]
Sent: Thursday, November 22, 2007 5:54:32 PM
Subject: Re: Heritrix and Solr

On Thu, 22 Nov 2007 10:41:41 -0500
George Everitt <[EMAIL PROTECTED]> wrote:

After a lot of googling, I came across Heritrix, which seems to be
the
most robust well supported open source crawler out there.   Heritrix

has an integration with Nutch (NutchWax), but not with Solr.   I'm
wondering if anybody can share any experience using Heritrix with
Solr.

out on a limb here... both Nutch and SOLR use Lucene for the actual
indexing / searching. Would the indexes generated with Nutch be compatible
/ readable with SOLR?

_________________________
{Beto|Norberto|Numard} Meijome

"Why do you sit there looking like an envelope without any address on
it?"
 Mark Twain

I speak for myself, not my employer. Contents may be hot. Slippery when
wet. Reading disclaimers makes you go blind. Writing them is worse.
You have been Warned.





Reply via email to