Hi
I am hoping someone can point me in the right direction with regards to
indexing words that are concatenated together to make other words or product
names.
We have indexed a product database and have come across some search terms
where zero results are returned. There are products in the
We have indexed a product database and have come across some search terms
where zero results are returned. There are products in the index with
'Borderlands xxx xxx', 'Dragonfly xx xxx' in the title. Searches for
'Borderland' or 'Border Land' and 'Dragon Fly' return zero results
Using synonyms might be a better solution because the use of
EdgeNGramTokenizerFactory has the potential of creating a large number
of token which will artificially increase the number of tokens in the
index which in turn will affect the IDF score.
A query for borderland should have returned
Using synonyms might be a better solution because the use of
EdgeNGramTokenizerFactory has the potential of creating a large number of
token which will artificially increase the number of tokens in the index
which in turn will affect the IDF score.
Well, I don't see a reason as to why
Would you mind explaining how omitNorm has any effect on the IDF problem
I described earlier?
I agree with your second sentence. I had to use the NGramTokenFilter to
accommodate partial matches.
On 10/05/2009 12:11 PM, Avlesh Singh wrote:
Using synonyms might be a better solution because
fyi, if you don't want to turn off norms entirely, try this option in
lucene 2.9 DefaultSimilarity:
public void setDiscountOverlaps(boolean v)
Determines whether overlap tokens (Tokens with 0 position increment)
are ignored when computing norm. By default this is false, meaning
overlap tokens
Zambrano, I was too quick to respond to your idf explanation. I definitely
did not mean that idf and length-norms are the same thing.
Andrew, this is how i would have done it -
First, I would create a field called prefix_text as undeneath in my
schema.xml
fieldType name=prefix_text