As I understood it setMinimumNumberShouldMatch(int min) Is used to specify a minimum number of the optional BooleanClauses which must be satisfied.
I haven't seen the implementation of setMinimumNumberShouldMatch but it seems a bit different than what is intended with the WAND operator, which can take any real number as threshold θ As stated in the paper: WAND(X1,w1, . . . Xk,wk, θ) is true iff X 1≤i≤k and SUM(xiwi)≥ θ where xi is the indicator variable for Xi, that is xi = 1, if Xi is true 0, otherwise. Observe that WAND can be used to implement AND and OR via AND(X1,X2, . . .Xk) ≡ WAND(X1, 1,X2, 1, . . . Xk, 1, k), and OR(X1,X2, . ..Xk) ≡ WAND(X1, 1,X2, 1, . ..Xk, 1, 1). What I find interesting is the idea of using a first pass using the upper bound (maximal) contribution of a term on any document score and the dynamic setting of the threshold θ to skip or to fully evaluate a document.. As stated in the paper: "Given this setup our preliminary scoring consists of evaluating for each document d WAND(X1,UB1,X2,UB2, . . .,Xk,UBk, θ), where Xi is an indicator variable for the presence of query term i in document d and the threshold θ is varied during the algorithm as explained below. If WAND evaluates to true, then the document d undergoes a full evaluation. The threshold θ is set dynamically by the algorithm based on the minimum score m among the top n results found so far, where n is the number of requested documents. The larger the threshold, the more documents will be skipped and thus we will need to compute full scores for fewer documents." I think its worth a try... -- Joaquin On Mon, Nov 16, 2009 at 2:54 AM, Andrzej Bialecki <a...@getopt.org> wrote: > > J. Delgado wrote: >> >> Here is the link to the paper. >> http://cis.poly.edu/westlab/papers/cntdstrb/p426-broder.pdf >> >> A more recent application of the use and extension of the WAND operator for >> indexing of Boolean expressions: >> http://ilpubs.stanford.edu:8090/927/2/wand_vldb.pdf >> >> -- Joaquin >> >> >> On Sun, Nov 15, 2009 at 11:12 PM, Shalin Shekhar Mangar < >> shalinman...@gmail.com> wrote: >> >>> Hey Joaquin, >>> >>> The mailing list strips off attachments. Can you please upload it somewhere >>> and give us the link? >>> >>> On Mon, Nov 16, 2009 at 12:35 PM, J. Delgado <joaquin.delg...@gmail.com >>>> >>>> wrote: >>>> Please find attached the paper on "Efficient Query Evaluation using a >>>> Two-Level Retrieval Process". I believe that such approach may improve >>> >>> the >>>> >>>> way Lucene/Solr evaluates queries today. > > The functionality of WAND (weak AND) is already implemented in Lucene, if I > understand it correctly - this is the BooleanQuery.setMinShouldMatch(int). > Lucene implements this probably differently from the algorithm described in > the paper, so there may be still some benefits from comparing the algorithms > in Lucene's BooleanScorer[2] with this one ... > > > -- > Best regards, > Andrzej Bialecki <>< > ___. ___ ___ ___ _ _ __________________________________ > [__ || __|__/|__||\/| Information Retrieval, Semantic Web > ___|||__|| \| || | Embedded Unix, System Integration > http://www.sigram.com Contact: info at sigram dot com >