RE: Custom tokenizer

2015-01-12 Thread Uwe Schindler
> Thanks for the reply. > > Hmm, I understand. > I know about AnalyzerWrapper, but that is not what I am looking for. > > I also know about cloning and overriding. I want my analyzer to behave > exactly the same as EnglishAnalyzer and right now I am copying the code > from the EnglishAnalyzer to

Re: Custom tokenizer

2015-01-12 Thread Vihari Piratla
Thanks for the reply. Hmm, I understand. I know about AnalyzerWrapper, but that is not what I am looking for. I also know about cloning and overriding. I want my analyzer to behave exactly the same as EnglishAnalyzer and right now I am copying the code from the EnglishAnalyzer to mimic the behavi

RE: Custom tokenizer

2015-01-12 Thread Uwe Schindler
Hi, Extending an existing Analyzer is not useful, because it is just a factory that returns a TokenStream instance to consumers. If you want to change the Tokenizer of an existing Analyzer, just clone it and rewrite its createComponents() method, see the example in the Javadocs: http://lucene.

Re: Custom Tokenizer/Analyzer

2014-02-20 Thread Ahmet Arslan
Hi Greet, I suggest you to do these kind of transformation on query time only. Don't interfere with the index. This is way is more flexible. You can disable/enable on the fly, change your list without re-indexing.  Just an imaginary example : When user passes String as International Businessma

Re: Custom Tokenizer/Analyzer

2014-02-20 Thread Michael McCandless
If you already know the set of phrases you need to detect then you can use Lucene's SynonymFilter to spot them and insert a new token. Mike McCandless http://blog.mikemccandless.com On Thu, Feb 20, 2014 at 7:21 AM, Benson Margulies wrote: > It sounds like you've been asked to implement Named E

Re: Custom Tokenizer/Analyzer

2014-02-20 Thread Benson Margulies
It sounds like you've been asked to implement Named Entity Recognition. OpenNLP has some capability here. There are also, um, commercial alternatives. On Thu, Feb 20, 2014 at 6:24 AM, Yann-Erwan Perio wrote: > On Thu, Feb 20, 2014 at 10:46 AM, Geet Gangwar > wrote: > > Hi, > > > My requirement

Re: Custom Tokenizer/Analyzer

2014-02-20 Thread Yann-Erwan Perio
On Thu, Feb 20, 2014 at 10:46 AM, Geet Gangwar wrote: Hi, > My requirement is it should have capabilities to match multiple words as > one token. for example. When user passes String as International Business > machine logo or IBM logo it should return International Business Machine as > one tok

Re: Custom Tokenizer

2013-12-05 Thread Erick Erickson
You can also string together one of a myriad of TokenFilters, see: http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters I'd recommend spending some time on the admin/analysis page to understand what all the combinations do. I'd also recommend against dealing with punctuation etc by using wi

Re: Custom Tokenizer

2013-12-05 Thread Furkan KAMACI
Hi; Standard tokenizer includes of that bydefault: StandardFilter, LowerCaseFilter and StopFilter You can consider char filters. Did you read here: https://cwiki.apache.org/confluence/display/solr/CharFilterFactories Thanks; Furkan KAMACI 2013/12/5 > Hi, > > I have used StandardAnalyzer in