John Martyniak wrote:
Dennis,

Thanks for the information.

Can you tell me what the benefit of integrating with SOLR would be? It seems to me that the only gap between the two is that Nutch has a Spider, and SOLR has incremental index, query warming, etc.

It is really about size of data and type of usage. Nutch is specifically for web search while Solr is IMO better for enterprise and restricted domain search. Nutch uses MapReduce throughough, Solr doesn't (although indexes can be created in MR and served by Solr). Nutch has a crawler, Solr doesn't. Nutch has a distributed search server. Solr is working towards the same type of distributed search model. I think the biggest difference in terms of ideology is web search is batch oriented, do a crawl, process, analyze, and index it, while enterprise search is closer to real time updates and dynamic changes.

So there are significant differences even though they can work together. The current integration work is to allow indexes created by nutch to be served by Solr. If your domain is creating a full text search from a database, or something like radius or location search, I would use Solr. It you want to create a large www or vertical search engine I would use Nutch. If you have a large amount of data to crawl and/or process and still want to integrate with a database I would use Nutch / Hadoop to acquire and process the data and solr to serve it.


And the approximate timing of the next release?

Well we were going to release 1.0 when hadoop released 1.0. They were planning on doing that after verison 0.17. But they have continued along the path to version 0.20 so I don't exactly know when a 1.0 release for hadoop would be. My guess, although no hard and firm plans is within the next 1-2 months. Many patches are complete now and need to be integrated, then let sit for a month or so to work out any bugs.

Dennis


-John

On Oct 22, 2008, at 1:29 PM, Dennis Kubes wrote:

We have been working on major feature upgrades for version 1. That took some time. It includes things like a new scoring framework, an new indexing framework, serving search results in XML and JSON, integration with SOLR and HBase, among others. Not dead, just busy.

Dennis

John Martyniak wrote:
Ronny,
Thanks for the info.
Does you know what the approximate timing for that is (Days, weeks, months)? And also the feature set.
-John
On Oct 21, 2008, at 7:50 PM, RONNY wrote:
Nutch is too young a project to die the men are finalizing version 1.0
Ronny


John Martyniak wrote:
Hi,

I have been playing around with Nutch for a little while, and I see a ton of emails on the mailing lists, but there hasn't been a formal build in more than a year.

Are there any plans? Is this project still being worked on?

Any thoughts would be greatly appreciated.

-John




Reply via email to