Björn Wilmsmann wrote: > > Am 07.10.2006 um 17:40 schrieb Cristina Belderrain: > >> Let me remind you that all this must be done just to provide something >> that's already there: Nutch is built on top of Lucene, after all. If >> it's hard to understand why Lucene's capabilities were simply >> neutralized in Nutch, it's even harder to figure out why no choice was >> left to users by means of some configuration file. > > I think this issue is rooted in the underlying philosophy of Nutch: > Nutch was designed with the idea of a possible Google(and the > likes)-sized crawler and indexer in mind. Regular expressions and > wildcard queries do not seem to fit into this philosophy, as such > queries would be way less efficient on a huge data set than simple > boolean queries. > > Nevertheless, I agree that there should be an option to choose the > Lucene query engine instead of the Nutch flavour one because Nutch has > been proven to be equally suitable for areas which do not require as > efficient queries (like intranet crawling for instance) as an all-out > web indexing application.
Hi, if it's not the full feature-set, maybe most people could live with it. But basic boolean queries I think were the root for this topic. Is there an "easier" way to allow this in Nutch as well instead of throwing quite a bit away and using the Lucene-syntax? As has just been pointed out: It seems quite a few things need to be "changed" to use Lucene-search instead of a Nutch-search. I don't think that it's needed in most cases. But I see several reasons where a boolean query would make sense. (Currently I do fetch up to 10.000 or so results using opensearch and filter them in a script myself, since no "AND (site:... or site:...)" is yet possible.) Regards, Stefan ------------------------------------------------------------------------- Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys -- and earn cash http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV _______________________________________________ Nutch-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-general
