Hmmm was my mail so weird or my question so stupid ... or is
there simply noone with an answer? Not even a hint? :(
Tobias Dittrich schrieb:
Hi all,
I know there are a lot of topics about compound word search already but
I haven't found anything for my specific problem yet. So if this is
already answered (which would be nice :)) then any hints or search
phrases for the mail archive would be apreciated.
Bascially I want users to be able to search my index for compound words
that are not really compounds but merely terms that can be written in
several ways.
For example I have the categories "usb" and "cable" in my index and I
want the user to be able to search for "usbcable" or "usb-cable" etc.
Also there is "bluetooth" in the index and I want the search for "blue
tooth" to return the corresponding documents.
My approach is to use ShingleFilterFactory followed by
WordDelimiterFilterFactory to index all possible combinations of words
and get rid of intra-word delimiters. This nicely covers the first part
of my requirements since the terms "usb" and "cable" somewhere along the
process get concatenated and "usbcable" is in the index.
Now I also want use this on the query side, so the user input "blue
tooth" (not as phrase) would become "bluetooth" for this field and
produce a hit. But this never happens since with the DisMax Searcher the
parser produces a query like this:
((category:blue | name:blue)~0.1 (category:tooth | name:tooth)~0.1)
And the filters and analysers for this field never get to see the whole
user query and cannot perform their shingle and delimiter tasks :(
So my question now is: how can I get this working? Is there a preferable
way to deal with this compound word problem? Is there another query
parser that already does the trick?
Or would it make sense to write my own query parser that passes the user
query "as is" to the several fields?
Any hints on this are welcome.
Thanks in advance
Tobias
--
Tobias Dittrich
- Leiter Internet-Entwicklung -
_________________________________
WAVE Computersysteme GmbH
Philipp-Reis-Str. 9
35440 Linden
Geschäftsführer: Carsten Kellmann
Registergericht Gießen HRB 1823
Fon: +49 (0) 6403 / 9050 6001
Fax: +49 (0) 6403 / 9050 5089
mailto:dittr...@wave-computer.de
http://www.wave-computer.de