Re: search ignoring accents

Ahmet Arslan Fri, 17 Apr 2015 10:08:59 -0700

Hi Pedro,

Requirement of "Filter by "edr" should give the result "Pedro"" can be done 
expanding terms at index time only.
You can remove the ngram filter from query analyzer. 
But remember that ngram filter produces a lot of tokens. Try it on analysis 
page.


Regarding starting at the beginning or the ending, there is an 
EdgeNGramTokenFilter where you can specify side, front or back.

Ahmet




On Friday, April 17, 2015 2:50 PM, Pedro Figueiredo 
<pjlfigueir...@criticalsoftware.com> wrote:
And for this example what filter should I use?

Filter by "edr" should give the result "Pedro"
The NGram create tokens starting at the beginning or the ending, and in the 
middle?

Thanks!

Pedro Figueiredo
Senior Engineer

pjlfigueir...@criticalsoftware.com
M. 934058150


Rua Engº Frederico Ulrich, nº 2650 4470-605 Moreira da Maia, Portugal
T. +351 229 446 927 | F. +351 229 446 929
www.criticalsoftware.com

PORTUGAL | UK | GERMANY | USA | BRAZIL | MOZAMBIQUE | ANGOLA
A CMMI® LEVEL 5 RATED COMPANY CMMI® is registered in the USPTO by CMU"




-----Original Message-----
From: Pedro Figueiredo [mailto:pjlfigueir...@criticalsoftware.com] 
Sent: 17 April 2015 12:22
To: solr-user@lucene.apache.org; 'Ahmet Arslan'
Subject: RE: search ignoring accents

Hi Ahmet,

Yes... the EdgeNGram is what produces those results...
I need it to improve the search by name by the applications users.

Thanks.

Pedro Figueiredo
Senior Engineer

pjlfigueir...@criticalsoftware.com
M. 934058150


Rua Engº Frederico Ulrich, nº 2650 4470-605 Moreira da Maia, Portugal T. +351 
229 446 927 | F. +351 229 446 929 www.criticalsoftware.com

PORTUGAL | UK | GERMANY | USA | BRAZIL | MOZAMBIQUE | ANGOLA A CMMI® LEVEL 5 
RATED COMPANY CMMI® is registered in the USPTO by CMU"



-----Original Message-----
From: Ahmet Arslan [mailto:iori...@yahoo.com.INVALID]
Sent: 17 April 2015 12:01
To: solr-user@lucene.apache.org
Subject: Re: search ignoring accents

Hi Pedro,

solr.ASCIIFoldingFilterFactory is one way to remove diacritics.
Confusion comes from EdgeNGram, why do you need it?

Ahmet



On Friday, April 17, 2015 1:38 PM, Pedro Figueiredo 
<pjlfigueir...@criticalsoftware.com> wrote:



Hello,

What is the best way to search in a field ignoring accents?

The field has the type:
                <fieldType name="text_general_edge_ngram" 
class="solr.TextField" positionIncrementGap="100">
                               <analyzer type="index">
                                               <tokenizer 
class="solr.LowerCaseTokenizerFactory"/>
                                               <filter 
class="solr.EdgeNGramFilterFactory" minGramSize="2" maxGramSize="15"/>
                               </analyzer>
                               <analyzer type="query">
                                               <tokenizer 
class="solr.LowerCaseTokenizerFactory"/>
                                               <filter 
class="solr.EdgeNGramFilterFactory" minGramSize="2" maxGramSize="15"/>
                               </analyzer>
                </fieldType>

I’ve tried adding the filter:  <filter class="solr.ASCIIFoldingFilterFactory"/>
but some strange results happened.. like:

Search by “Mourao” and the results were:
Mourão -> OK
Monteiro -> NOTOK
Morais -> NOTOK

Thanks in advanced,

Pedro Figueiredo
Senior Engineer

pjlfigueir...@criticalsoftware.com
M. 934058150 
  
Rua Engº Frederico Ulrich, nº 2650 4470-605 Moreira da Maia, Portugal T. +351 
229 446 927 | F. +351 229 446 929 www.criticalsoftware.com

PORTUGAL | UK | GERMANY | USA | BRAZIL | MOZAMBIQUE | ANGOLA A CMMI® LEVEL 5 
RATED COMPANY CMMI® is registered in the USPTO by CMU"

Re: search ignoring accents

Reply via email to