Hi, On Tue, Jan 5, 2010 at 5:17 PM, Erick Erickson <erickerick...@gmail.com>wrote:
> We need to back up, this is looking like an XY problem. That is, > you're asking for specifics when what would probably be more > helpful is for you to describe *what* the problem you're trying > to solve is rather than *how* to make a specific behavior > happen. Although re-reading your original e-mail does give a > clue <G>.... > > If, for instance, you really really want the string indexed and searched > literally (if, for instance, it's a part number), you want to use something > like WhitespaceTokenizerFactory, perhaps lowercasing too, rather > than fiddle around with KeywordTokenizerFactory. If you want some > other behavior, please explain it in more detail <G>... > I am indexing files that also include traffic captures (so there can be pretty much anything inside). When looking for a long alphanumeric string I would have expected to have fewer results than when searching with a short one. But through of all the tokenizing it returns more (useless) results. This is very disappointing because i could find these documents with grep easily. Whats even more disappointing: disabling the WordDelimiterFilterFactory (for query and/or text) will just result in 0 hits on my document. Im not quite sure what to do. Ideally I would like to be able to search for strings as a1a1a1a1a1a1a1 that would not match against single "a" and / or "1". Bernd