Hi Jonathan, >> I'm afraid I'm having trouble understanding "if the analyzer returns more >> than one position back from a "queryparser token"
>>I'm not sure if "the queryparser forms a phrase query without explicit phrase >>quotes" is a problem for me, I had no idea it happened until now, never >>noticed, and still don't really understand in what circumstances it happens. The problem I had was for a Boolean query "l'art AND historie" that the WordDelimiterFilter tokenized "l'art" as two tokens "l" at position 1 and "art" at position 2. So the queryparser decided this means a phrase query for "l" followed immediately by "art". See http://www.hathitrust.org/blogs/large-scale-search/tuning-search-performance for details. This would happen whenever any token filter split a token into more than one token. For example a filter that splits foo-bar into "foo" "bar". The exception is SynonymFilter or something like it. In the case of SynonymFilter, its not really a case of "splitting" one token into multiple tokens, but given one token of input, it outputs all the synonyms of the term. However all the tokens have the same position attribute. (see: http://www.lucidimagination.com/search/document/CDRG_ch05_5.6.19?q=synonym%20filter) So for example for the string "the small thing" if you had a synonym list for small: small=>tiny,teeny" input: postion|1 |2 |3 token |the |small|thing Would output postion|1 |2 |2 |2 |3 token |the |small| tiny|teeny|thing In this case when the queryParser gets back "small teeny tiny" since they have the same position, they are not turned into a phrase query. for "l'art" input postion|1 token |l'art output postion|1 |2 token |l |art In this case there are two tokens with different positions so it treats them as a phrase query. Tom Burton-West