subject:"A little help with indexing joined words"

A little help with indexing joined words

2009-10-05 Thread Andrew McCombe

Hi I am hoping someone can point me in the right direction with regards to indexing words that are concatenated together to make other words or product names. We have indexed a product database and have come across some search terms where zero results are returned. There are products in the

Re: A little help with indexing joined words

2009-10-05 Thread Avlesh Singh

We have indexed a product database and have come across some search terms where zero results are returned. There are products in the index with 'Borderlands xxx xxx', 'Dragonfly xx xxx' in the title. Searches for 'Borderland' or 'Border Land' and 'Dragon Fly' return zero results

Re: A little help with indexing joined words

2009-10-05 Thread Christian Zambrano

Using synonyms might be a better solution because the use of EdgeNGramTokenizerFactory has the potential of creating a large number of token which will artificially increase the number of tokens in the index which in turn will affect the IDF score. A query for borderland should have returned

Re: A little help with indexing joined words

2009-10-05 Thread Avlesh Singh

Using synonyms might be a better solution because the use of EdgeNGramTokenizerFactory has the potential of creating a large number of token which will artificially increase the number of tokens in the index which in turn will affect the IDF score. Well, I don't see a reason as to why

Re: A little help with indexing joined words

2009-10-05 Thread Christian Zambrano

Would you mind explaining how omitNorm has any effect on the IDF problem I described earlier? I agree with your second sentence. I had to use the NGramTokenFilter to accommodate partial matches. On 10/05/2009 12:11 PM, Avlesh Singh wrote: Using synonyms might be a better solution because

Re: A little help with indexing joined words

2009-10-05 Thread Robert Muir

fyi, if you don't want to turn off norms entirely, try this option in lucene 2.9 DefaultSimilarity: public void setDiscountOverlaps(boolean v) Determines whether overlap tokens (Tokens with 0 position increment) are ignored when computing norm. By default this is false, meaning overlap tokens

Re: A little help with indexing joined words

2009-10-05 Thread Avlesh Singh

Zambrano, I was too quick to respond to your idf explanation. I definitely did not mean that idf and length-norms are the same thing. Andrew, this is how i would have done it - First, I would create a field called prefix_text as undeneath in my schema.xml fieldType name=prefix_text

A little help with indexing joined words

Re: A little help with indexing joined words

Re: A little help with indexing joined words

Re: A little help with indexing joined words

Re: A little help with indexing joined words

Re: A little help with indexing joined words

Re: A little help with indexing joined words

7 matches

Site Navigation

Mail list logo

Footer information