RE: Problem with first letter accented

Villacorta Peral, Eva Fri, 08 Jul 2011 02:15:18 -0700

I'm sorry if this mail is repeated. But my server mail gave me an error.

Hi!


I've changed the server.xml to add the URI Enconding. I've changed the schema 
version to 1.4. And I've reindexed my DB. But nothing has changed.

In the analisys.jsp I've searched for "más", in order to find what happens with 
that word, and it's also recognized as two characters, just like "ágora". But 
it works for "más".

The order of filter application may be relevant?? I don't read anything about 
it, but...



-----Mensaje original-----
De: Ahmet Arslan [mailto:iori...@yahoo.com] 
Enviado el: viernes, 08 de julio de 2011 9:57
Para: solr-user@lucene.apache.org
Asunto: RE: Problem with first letter accented


Hello,

As I see from analyis.jsp your á letter is not converted to 'a' by ASCII 
folding filter. It is recognized as two characters 'Ã¡' (before it comes to 
ASCII folding) for some reason.

First of all I would check URI Encoding of my servlet container. It should be 
utf-8.  See tomcat's config:
http://wiki.apache.org/solr/SolrTomcat#URI_Charset_Config

Not related to this issue but I recommend you to use 1.4 as schema version.  
<schema name="ca_objects" version="1.4">


--- On Fri, 7/8/11, Villacorta Peral, Eva <e.villaco...@ibermatica.com> wrote:

> From: Villacorta Peral, Eva <e.villaco...@ibermatica.com>
> Subject: RE: Problem with first letter accented
> To: solr-user@lucene.apache.org
> Date: Friday, July 8, 2011, 10:26 AM
> I'm using collectiveaccess, and its
> DB structure. Perhaps this is useful...
> 
> My type definition is:
> 
> <schema name="ca_objects" version="1.1">
>     <types>
>         <fieldType
> name="text" class="solr.TextField"
> positionIncrementGap="100">
>            
> <analyzer>
>            
>     <tokenizer
> class="solr.WhitespaceTokenizerFactory"/>
>            
>     <filter
> class="solr.LowerCaseFilterFactory"/>
>            
>     <filter
> class="solr.ASCIIFoldingFilterFactory"/> 
>            
>     <filter
> class="solr.EdgeNGramFilterFactory" minGramSize="2"
> maxGramSize="15" side="front"/>
>            
> </analyzer>
>         </fieldType>
>         <fieldType
> name="string" class="solr.StrField" />
>         <fieldtype
> name="ignored" stored="false" indexed="false"
> class="solr.StrField" /> 
>     </types>
> 
> And the analisys of "ágora" is:
> 
> Index Analyzer
> org.apache.solr.analysis.WhitespaceTokenizerFactory
> {luceneMatchVersion=LUCENE_24}
> position     1 
> term text     Ã¡gora 
> startOffset 0 
> endOffset     6 
> 
> org.apache.solr.analysis.LowerCaseFilterFactory
> {luceneMatchVersion=LUCENE_24}
> position     1 
> term text     ã¡gora 
> startOffset 0 
> endOffset     6 
> 
> org.apache.solr.analysis.ASCIIFoldingFilterFactory
> {luceneMatchVersion=LUCENE_24}
> position     1 
> term text     a¡gora 
> startOffset 0 
> endOffset     6 
> 
> org.apache.solr.analysis.EdgeNGramFilterFactory
> {maxGramSize=15, side=front, minGramSize=2,
> luceneMatchVersion=LUCENE_24}
> position     1     2
>     3     4    
>     5 
> term text     a¡     a¡g
>     a¡go     a¡gor
>     a¡gora 
> startOffset 0     0     0
>     0         0
> 
> endOffset     2    
> 3    4     5
>         6 
> 
> Query Analyzer
> org.apache.solr.analysis.WhitespaceTokenizerFactory
> {luceneMatchVersion=LUCENE_24}
> position     1 
> term text     Ã¡gora 
> startOffset 0 
> endOffset     6 
> 
> org.apache.solr.analysis.LowerCaseFilterFactory
> {luceneMatchVersion=LUCENE_24}
> position     1 
> term text     ã¡gora 
> startOffset 0 
> endOffset     6 
> 
> org.apache.solr.analysis.ASCIIFoldingFilterFactory
> {luceneMatchVersion=LUCENE_24}
> position     1 
> term text     a¡gora 
> startOffset 0 
> endOffset     6 
> 
> org.apache.solr.analysis.EdgeNGramFilterFactory
> {maxGramSize=15, side=front, minGramSize=2,
> luceneMatchVersion=LUCENE_24}
> position     1     2
>     3     4    
> 5 
> term text     a¡     a¡g
>     a¡go     a¡gor a¡gora 
> startOffset 0     0     0
>     0     0 
> endOffset     2     3
>     4     5    
> 6
> 
> I hope this was enough... Ask me whatever you need. Thx
> 
> 
> > I'm using Solr 3.3 for searching in different
> languages,
> > one of them is Spanish. The ASCIIFoldingFilterFactory
> works
> > fine, but if word begins with a letter accented, like
> > "ágora" or "ínclito", it can't find anything. I have
> to
> > search word without accent in order to find some
> result. For
> > instance:
> > 
> >  
> > 
> > -          Title: Imágenes del
> > ágora de la plaza central.
> > 
> > -          Searching text:
> > "imágenes" or "imagenes" returns the same result, the
> title
> > above
> > 
> > -          Searching text:
> > "ágora" returns no results, while "agora" returns the
> right
> > result
> 
> That's quite strange. Your field type definition would be
> needed. 
> 
> and admin/analysis.jsp show step by step output of
> analysis.
> What happens to words  "ágora" or "ínclito" at index
> time and query time?
>

RE: Problem with first letter accented

Reply via email to