> I have been playing around with nutch and find it to
> be a nice, complete search package. I have a couple of
> questions, mostly about the web db component. 
> 
> How well does it scale? For example, most RDBMS slow
> down after a certain number of tables ie 4 or 5
> million? How many entries can nutch deal with? Are
> there any plans to change this, just curios?

Nutch is built to scale.  See Yahoo Labs, where the index apparently
has 200M entries.  Nutch is also built with distributed searching in
mind.

> When reading the page/brin paper on the original
> google, at one point it mentiones that before the
> indexes are queried by users, the are "inverted
> indexes". Does anyone know why this is done? Does
> nutch do this? 

Nutch uses Lucene, and Lucene does use inverted indices.
Inverted indices are very common for search engine applications, as
they allow a fast keyword -> document (e.g. web page) lookup.

Otis

> Thank you very much for your time.
> -Yousef
> 
> 
> -------------------------------------------------------
> This SF.Net email is sponsored by: YOU BE THE JUDGE. Be one of 170
> Project Admins to receive an Apple iPod Mini FREE for your judgement
> on
> who ports your project to Linux PPC the best. Sponsored by IBM.
> Deadline: Sept. 24. Go here: http://sf.net/ppc_contest.php
> _______________________________________________
> Nutch-general mailing list
> [EMAIL PROTECTED]
> https://lists.sourceforge.net/lists/listinfo/nutch-general
> 



-------------------------------------------------------
This SF.Net email is sponsored by: YOU BE THE JUDGE. Be one of 170
Project Admins to receive an Apple iPod Mini FREE for your judgement on
who ports your project to Linux PPC the best. Sponsored by IBM.
Deadline: Sept. 24. Go here: http://sf.net/ppc_contest.php
_______________________________________________
Nutch-general mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to