: I have the following use case. I could implement the solution but : performance is affected. I need some smart ways of doing this. : Use Case : : Incoming data has two fields which have values like 'WAL MART STORES INC' : and 'wal-mart-stores-inc'. : Users can search the data either in 'walmart' 'wal mart' or 'wal-mart' : also partially on any part of the name from the start of word like 'wal', : 'walm' 'wal m' etc . I could get the solution by using two indexes, one : as text field for the first field (wal mart ) column and sub word : wal-mart-stores (with WordDelimiterFilterFactory filter).
there are lots of solutions that could work, all depending on what *else* you need to be able to match on besides just prefix queries where whitespace/punctuation are ignored. One example: using KeywordTokenizer, along with a PatternReplaceFilter that throws away non letter charagers and a LowercaseFilter and then issuing all your queries as PrefixQueries will get w* wa* wal* and walm* to all match "wal mart", "WALMART", "WAL-mart", etc.... but that won't let "mart" match a document contain "wal mart" .. but you can always use copyField and hit one field for the first type of query, and the other field for "normal" queries. depending on the nature of your data (ie: how many documents, how common certian prefixes are, etc...) you might get better performacne at the expense of a larger index if you use something like the EdgeNGramTokenFilter or EdgeNGramTokenizer to index all the prefixes of various sizes so you don't need to do a prefix query The bottom line: there are *lots* of options, you'll need to experimentto find the right solution that matches when you want to match, and doesn't when you don't -Hoss