Björn Wilmsmann wrote:
> 
> Am 07.10.2006 um 17:40 schrieb Cristina Belderrain:
> 
>> Let me remind you that all this must be done just to provide something
>> that's already there: Nutch is built on top of Lucene, after all. If
>> it's hard to understand why Lucene's capabilities were simply
>> neutralized in Nutch, it's even harder to figure out why no choice was
>> left to users by means of some configuration file.
> 
> I think this issue is rooted in the underlying philosophy of Nutch:
> Nutch was designed with the idea of a possible Google(and the
> likes)-sized crawler and indexer in mind. Regular expressions and
> wildcard queries do not seem to fit into this philosophy, as such
> queries would be way less efficient on a huge data set than simple
> boolean queries.
> 
> Nevertheless, I agree that there should be an option to choose the
> Lucene query engine instead of the Nutch flavour one because Nutch has
> been proven to be equally suitable for areas which do not require as
> efficient queries (like intranet crawling for instance) as an all-out
> web indexing application.

Hi,

if it's not the full feature-set, maybe most people could live with it.
But basic boolean queries I think were the root for this topic. Is there
an "easier" way to allow this in Nutch as well instead of throwing quite
a bit away and using the Lucene-syntax? As has just been pointed out: It
seems quite a few things need to be "changed" to use Lucene-search
instead of a Nutch-search. I don't think that it's needed in most cases.
But I see several reasons where a boolean query would make sense.

(Currently I do fetch up to 10.000 or so results using opensearch and
filter them in a script myself, since no "AND (site:... or site:...)" is
 yet possible.)


Regards,
 Stefan

-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys -- and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to