> Thanks for the reply.
>
> Hmm, I understand.
> I know about AnalyzerWrapper, but that is not what I am looking for.
>
> I also know about cloning and overriding. I want my analyzer to behave
> exactly the same as EnglishAnalyzer and right now I am copying the code
> from the EnglishAnalyzer to
Thanks for the reply.
Hmm, I understand.
I know about AnalyzerWrapper, but that is not what I am looking for.
I also know about cloning and overriding. I want my analyzer to behave
exactly the same as EnglishAnalyzer and right now I am copying the code
from the EnglishAnalyzer to mimic the behavi
Hi,
Extending an existing Analyzer is not useful, because it is just a factory that
returns a TokenStream instance to consumers. If you want to change the
Tokenizer of an existing Analyzer, just clone it and rewrite its
createComponents() method, see the example in the Javadocs:
http://lucene.
Hi Greet,
I suggest you to do these kind of transformation on query time only. Don't
interfere with the index. This is way is more flexible. You can disable/enable
on the fly, change your list without re-indexing.
Just an imaginary example : When user passes String as International
Businessma
If you already know the set of phrases you need to detect then you can
use Lucene's SynonymFilter to spot them and insert a new token.
Mike McCandless
http://blog.mikemccandless.com
On Thu, Feb 20, 2014 at 7:21 AM, Benson Margulies wrote:
> It sounds like you've been asked to implement Named E
It sounds like you've been asked to implement Named Entity Recognition.
OpenNLP has some capability here. There are also, um, commercial
alternatives.
On Thu, Feb 20, 2014 at 6:24 AM, Yann-Erwan Perio wrote:
> On Thu, Feb 20, 2014 at 10:46 AM, Geet Gangwar
> wrote:
>
> Hi,
>
> > My requirement
On Thu, Feb 20, 2014 at 10:46 AM, Geet Gangwar wrote:
Hi,
> My requirement is it should have capabilities to match multiple words as
> one token. for example. When user passes String as International Business
> machine logo or IBM logo it should return International Business Machine as
> one tok
You can also string together one of a myriad of TokenFilters, see:
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters
I'd recommend spending some time on the admin/analysis page
to understand what all the combinations do. I'd also recommend
against dealing with punctuation etc by using wi
Hi;
Standard tokenizer includes of that bydefault:
StandardFilter, LowerCaseFilter and StopFilter
You can consider char filters. Did you read here:
https://cwiki.apache.org/confluence/display/solr/CharFilterFactories
Thanks;
Furkan KAMACI
2013/12/5
> Hi,
>
> I have used StandardAnalyzer in