Re: Compound word search (maybe DisMaxQueryPaser problem)

Tobias Dittrich Fri, 13 Mar 2009 01:02:39 -0700

First of all: sorry Chris, Walter .. I did not mean to putpressure on anyone. It's just that if you're stuck withsomething and you have that little needle stinging saying:maybe you're just too damn stupid for this ... :) So, thanksa lot for your answers.

As for index time expansion using synonyms: I think this isnot an option for me since it would mean that I have to a)find all such words that might cause problems and b) findevery variant that might possibly be used by customers. Andthen in the end I have to keep all my synonym filesup-to-date. But the main design goal for my searchimplementation is little to no maintainance.

My original assumption for the DisMax Handler was, that itwill just take the original query string and pass it toevery field in its fieldlist using the fields configuredanalyzer stack. Maybe in the end add some stuff for thespecial options and so ... and then send the query tolucene. Can you explain why this approach was not choosen?


Thanks
Tobi


Chris Hostetter schrieb:

: Hmmm was my mail so weird or my question so stupid ... or is there simply
: noone with an answer? Not even a hint? :(
patience my freind, i've got a backlog of ~~500 Lucene related messages inmy INBOX, and i was just reading your original email when this reply camein.
In generally this is a fairly hard problem ... the easiest solution i knowof that works in most cases is to do index time expansion using theSYnonymFilter, so regardless of wether a document contains "usbcable""usb-cable" or "usb cable" all three varients get indexed, and then theuser can search for any of them.
the downside is that it can throw off your tf/idf stats for some terms (ifthey apear by themselves, and as part of a compound) and it can result infalse positives for esoteric phrase searches (but that tends to be more ofa theoretical problem then an actual one.
: > But this never happens since with the DisMax Searcher the parser produces a
: > query like this:
: >: > ((category:blue | name:blue)~0.1 (category:tooth | name:tooth)~0.1)
        ...
: > to deal with this compound word problem? Is there another query parser that
: > already does the trick?
take a look at the FieldQParserPlugin ... it passes the raw query stringto the analyser of a specified field -- this would let your TokenFilterssee the "stream" of tokens (which isn't possible with the conventionalQueryParser tokenization rules) but it doesn't have any of the"field/query matric cross product" goodness of dismax -- you'd only beable to query the one field.
(Hmmm.... i wonder if DisMaxQParser 2.0 could have an option to let youspecify a FieldType whose analyzer was used to tokenize the query stringinstead of using the Lucene QueryParser JavaCC tokenization, and *then*the tokens resulting from that initial analyzer could be passed to theanalyzers of the various qf fields ... hmmm, that might be just crazyenough to be too crazy to work)
-Hoss

Re: Compound word search (maybe DisMaxQueryPaser problem)

Reply via email to