Hi all,

I know there are a lot of topics about compound word search already but I haven't found anything for my specific problem yet. So if this is already answered (which would be nice :)) then any hints or search phrases for the mail archive would be apreciated.

Bascially I want users to be able to search my index for compound words that are not really compounds but merely terms that can be written in several ways.

For example I have the categories "usb" and "cable" in my index and I want the user to be able to search for "usbcable" or "usb-cable" etc. Also there is "bluetooth" in the index and I want the search for "blue tooth" to return the corresponding documents.

My approach is to use ShingleFilterFactory followed by WordDelimiterFilterFactory to index all possible combinations of words and get rid of intra-word delimiters. This nicely covers the first part of my requirements since the terms "usb" and "cable" somewhere along the process get concatenated and "usbcable" is in the index.

Now I also want use this on the query side, so the user input "blue tooth" (not as phrase) would become "bluetooth" for this field and produce a hit. But this never happens since with the DisMax Searcher the parser produces a query like this:

((category:blue | name:blue)~0.1 (category:tooth | name:tooth)~0.1)

And the filters and analysers for this field never get to see the whole user query and cannot perform their shingle and delimiter tasks :(

So my question now is: how can I get this working? Is there a preferable way to deal with this compound word problem? Is there another query parser that already does the trick?

Or would it make sense to write my own query parser that passes the user query "as is" to the several fields?

Any hints on this are welcome.

Thanks in advance
Tobias

--
Tobias Dittrich
- Leiter Internet-Entwicklung -
_________________________________
WAVE Computersysteme GmbH

Philipp-Reis-Str. 9
35440 Linden

Geschäftsführer: Carsten Kellmann
Registergericht Gießen HRB 1823

Fon: +49 (0) 6403 / 9050 6001
Fax: +49 (0) 6403 / 9050 5089
mailto:dittr...@wave-computer.de
http://www.wave-computer.de

Reply via email to