Andrzej Bialecki wrote:
> The degree of simplification is very substantial. Our NutchSuperQuery 
> doesn't have to do much more work than a simple TermQuery, so we can 
> assume that the cost to run it is the same as TermQuery times some 
> constant. What we gain then is the cost of not running all those boolean 
> clauses ...

The NutchSuperQuery would have to do more work, to boost things and 
since postings would be longer, and postings would also compress more 
poorly, so while there'd probably be some improvement, it wouldn't be 
quite as fast as a single-term query.

> If you're still with me at this point I must congratulate you. :) 
> However, that's as far as I thought it through for now - let the 
> discussion start! If you are a Lucene hacker I would gladly welcome your 
> review or even code contributions .. ;)

An implementation to consider is payloads.  If each posting has a weight 
attached, then the fieldBoost*fieldNorm could be stored there, and a 
simple gap-based method could be used to inhibit cross-field matches. 
Queries would look similar to your proposed approach.

http://www.gossamer-threads.com/lists/lucene/java-dev/37409

One might optimize the payload implementation with run-length 
compression: if a run of postings have the same payload it could be 
represented once at the start of the run along with the run's length. 
That would keep postings small, reducing i/o.

Doug


-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Nutch-developers mailing list
Nutch-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nutch-developers

Reply via email to