RE: Problem with first letter accented

Ahmet Arslan Fri, 08 Jul 2011 00:58:41 -0700

Hello,

As I see from analyis.jsp your á letter is not converted to 'a' by ASCII 
folding filter. It is recognized as two characters 'Ã¡' (before it comes to 
ASCII folding) for some reason.


First of all I would check URI Encoding of my servlet container. It should be 
utf-8.  See tomcat's config:
http://wiki.apache.org/solr/SolrTomcat#URI_Charset_Config

Not related to this issue but I recommend you to use 1.4 as schema version.  
<schema name="ca_objects" version="1.4">


--- On Fri, 7/8/11, Villacorta Peral, Eva <e.villaco...@ibermatica.com> wrote:

> From: Villacorta Peral, Eva <e.villaco...@ibermatica.com>
> Subject: RE: Problem with first letter accented
> To: solr-user@lucene.apache.org
> Date: Friday, July 8, 2011, 10:26 AM
> I'm using collectiveaccess, and its
> DB structure. Perhaps this is useful...
> 
> My type definition is:
> 
> <schema name="ca_objects" version="1.1">
>     <types>
>         <fieldType
> name="text" class="solr.TextField"
> positionIncrementGap="100">
>            
> <analyzer>
>            
>     <tokenizer
> class="solr.WhitespaceTokenizerFactory"/>
>            
>     <filter
> class="solr.LowerCaseFilterFactory"/>
>            
>     <filter
> class="solr.ASCIIFoldingFilterFactory"/> 
>            
>     <filter
> class="solr.EdgeNGramFilterFactory" minGramSize="2"
> maxGramSize="15" side="front"/>
>            
> </analyzer>
>         </fieldType>
>         <fieldType
> name="string" class="solr.StrField" />
>         <fieldtype
> name="ignored" stored="false" indexed="false"
> class="solr.StrField" /> 
>     </types>
> 
> And the analisys of "ágora" is:
> 
> Index Analyzer
> org.apache.solr.analysis.WhitespaceTokenizerFactory
> {luceneMatchVersion=LUCENE_24}
> position     1 
> term text     Ã¡gora 
> startOffset 0 
> endOffset     6 
> 
> org.apache.solr.analysis.LowerCaseFilterFactory
> {luceneMatchVersion=LUCENE_24}
> position     1 
> term text     ã¡gora 
> startOffset 0 
> endOffset     6 
> 
> org.apache.solr.analysis.ASCIIFoldingFilterFactory
> {luceneMatchVersion=LUCENE_24}
> position     1 
> term text     a¡gora 
> startOffset 0 
> endOffset     6 
> 
> org.apache.solr.analysis.EdgeNGramFilterFactory
> {maxGramSize=15, side=front, minGramSize=2,
> luceneMatchVersion=LUCENE_24}
> position     1     2
>     3     4    
>     5 
> term text     a¡     a¡g
>     a¡go     a¡gor
>     a¡gora 
> startOffset 0     0     0
>     0         0
> 
> endOffset     2    
> 3    4     5
>         6 
> 
> Query Analyzer
> org.apache.solr.analysis.WhitespaceTokenizerFactory
> {luceneMatchVersion=LUCENE_24}
> position     1 
> term text     Ã¡gora 
> startOffset 0 
> endOffset     6 
> 
> org.apache.solr.analysis.LowerCaseFilterFactory
> {luceneMatchVersion=LUCENE_24}
> position     1 
> term text     ã¡gora 
> startOffset 0 
> endOffset     6 
> 
> org.apache.solr.analysis.ASCIIFoldingFilterFactory
> {luceneMatchVersion=LUCENE_24}
> position     1 
> term text     a¡gora 
> startOffset 0 
> endOffset     6 
> 
> org.apache.solr.analysis.EdgeNGramFilterFactory
> {maxGramSize=15, side=front, minGramSize=2,
> luceneMatchVersion=LUCENE_24}
> position     1     2
>     3     4    
> 5 
> term text     a¡     a¡g
>     a¡go     a¡gor a¡gora 
> startOffset 0     0     0
>     0     0 
> endOffset     2     3
>     4     5    
> 6
> 
> I hope this was enough... Ask me whatever you need. Thx
> 
> 
> > I'm using Solr 3.3 for searching in different
> languages,
> > one of them is Spanish. The ASCIIFoldingFilterFactory
> works
> > fine, but if word begins with a letter accented, like
> > "ágora" or "ínclito", it can't find anything. I have
> to
> > search word without accent in order to find some
> result. For
> > instance:
> > 
> >  
> > 
> > -          Title: Imágenes del
> > ágora de la plaza central.
> > 
> > -          Searching text:
> > "imágenes" or "imagenes" returns the same result, the
> title
> > above
> > 
> > -          Searching text:
> > "ágora" returns no results, while "agora" returns the
> right
> > result
> 
> That's quite strange. Your field type definition would be
> needed. 
> 
> and admin/analysis.jsp show step by step output of
> analysis.
> What happens to words  "ágora" or "ínclito" at index
> time and query time?
>

RE: Problem with first letter accented

Reply via email to