Re: Custom Tokenizer/Analyzer

2014-02-20 Thread Ahmet Arslan
Hi Greet, I suggest you to do these kind of transformation on query time only. Don't interfere with the index. This is way is more flexible. You can disable/enable on the fly, change your list without re-indexing.  Just an imaginary example : When user passes String as International Businessma

Re: Custom Tokenizer/Analyzer

2014-02-20 Thread Michael McCandless
If you already know the set of phrases you need to detect then you can use Lucene's SynonymFilter to spot them and insert a new token. Mike McCandless http://blog.mikemccandless.com On Thu, Feb 20, 2014 at 7:21 AM, Benson Margulies wrote: > It sounds like you've been asked to implement Named E

Re: Custom Tokenizer/Analyzer

2014-02-20 Thread Benson Margulies
It sounds like you've been asked to implement Named Entity Recognition. OpenNLP has some capability here. There are also, um, commercial alternatives. On Thu, Feb 20, 2014 at 6:24 AM, Yann-Erwan Perio wrote: > On Thu, Feb 20, 2014 at 10:46 AM, Geet Gangwar > wrote: > > Hi, > > > My requirement

Re: Custom Tokenizer/Analyzer

2014-02-20 Thread Yann-Erwan Perio
On Thu, Feb 20, 2014 at 10:46 AM, Geet Gangwar wrote: Hi, > My requirement is it should have capabilities to match multiple words as > one token. for example. When user passes String as International Business > machine logo or IBM logo it should return International Business Machine as > one tok

Custom Tokenizer/Analyzer

2014-02-20 Thread Geet Gangwar
Hi, I have a requirement to write a custom tokenizer using Lucene framework. My requirement is it should have capabilities to match multiple words as one token. for example. When user passes String as International Business machine logo or IBM logo it should return International Business Machine