Hi all,
I know there are a lot of topics about compound word search
already but I haven't found anything for my specific problem
yet. So if this is already answered (which would be nice :))
then any hints or search phrases for the mail archive would
be apreciated.
Bascially I want users to be able to search my index for
compound words that are not really compounds but merely
terms that can be written in several ways.
For example I have the categories "usb" and "cable" in my
index and I want the user to be able to search for
"usbcable" or "usb-cable" etc. Also there is "bluetooth" in
the index and I want the search for "blue tooth" to return
the corresponding documents.
My approach is to use ShingleFilterFactory followed by
WordDelimiterFilterFactory to index all possible
combinations of words and get rid of intra-word delimiters.
This nicely covers the first part of my requirements since
the terms "usb" and "cable" somewhere along the process get
concatenated and "usbcable" is in the index.
Now I also want use this on the query side, so the user
input "blue tooth" (not as phrase) would become "bluetooth"
for this field and produce a hit. But this never happens
since with the DisMax Searcher the parser produces a query
like this:
((category:blue | name:blue)~0.1 (category:tooth |
name:tooth)~0.1)
And the filters and analysers for this field never get to
see the whole user query and cannot perform their shingle
and delimiter tasks :(
So my question now is: how can I get this working? Is there
a preferable way to deal with this compound word problem? Is
there another query parser that already does the trick?
Or would it make sense to write my own query parser that
passes the user query "as is" to the several fields?
Any hints on this are welcome.
Thanks in advance
Tobias
--
Tobias Dittrich
- Leiter Internet-Entwicklung -
_________________________________
WAVE Computersysteme GmbH
Philipp-Reis-Str. 9
35440 Linden
Geschäftsführer: Carsten Kellmann
Registergericht Gießen HRB 1823
Fon: +49 (0) 6403 / 9050 6001
Fax: +49 (0) 6403 / 9050 5089
mailto:dittr...@wave-computer.de
http://www.wave-computer.de