RE: Problem with first letter accented

Villacorta Peral, Eva Fri, 08 Jul 2011 00:27:30 -0700

I'm using collectiveaccess, and its DB structure. Perhaps this is useful...


My type definition is:

<schema name="ca_objects" version="1.1">
        <types>
                <fieldType name="text" class="solr.TextField" 
positionIncrementGap="100">
                        <analyzer>
                                <tokenizer 
class="solr.WhitespaceTokenizerFactory"/>
                                <filter class="solr.LowerCaseFilterFactory"/>
                                <filter 
class="solr.ASCIIFoldingFilterFactory"/> 
                                <filter class="solr.EdgeNGramFilterFactory" 
minGramSize="2" maxGramSize="15" side="front"/>
                        </analyzer>
                </fieldType>
                <fieldType name="string" class="solr.StrField" />
                <fieldtype name="ignored" stored="false" indexed="false" 
class="solr.StrField" /> 
        </types>

And the analisys of "ágora" is:

Index Analyzer
org.apache.solr.analysis.WhitespaceTokenizerFactory 
{luceneMatchVersion=LUCENE_24}
position        1 
term text       Ã¡gora 
startOffset 0 
endOffset       6 

org.apache.solr.analysis.LowerCaseFilterFactory {luceneMatchVersion=LUCENE_24}
position        1 
term text       ã¡gora 
startOffset 0 
endOffset       6 

org.apache.solr.analysis.ASCIIFoldingFilterFactory 
{luceneMatchVersion=LUCENE_24}
position        1 
term text       a¡gora 
startOffset 0 
endOffset       6 

org.apache.solr.analysis.EdgeNGramFilterFactory {maxGramSize=15, side=front, 
minGramSize=2, luceneMatchVersion=LUCENE_24}
position        1       2       3       4               5 
term text       a¡      a¡g     a¡go    a¡gor   a¡gora 
startOffset 0   0       0       0               0 
endOffset       2       3       4       5               6 

Query Analyzer
org.apache.solr.analysis.WhitespaceTokenizerFactory 
{luceneMatchVersion=LUCENE_24}
position        1 
term text       Ã¡gora 
startOffset 0 
endOffset       6 

org.apache.solr.analysis.LowerCaseFilterFactory {luceneMatchVersion=LUCENE_24}
position        1 
term text       ã¡gora 
startOffset 0 
endOffset       6 

org.apache.solr.analysis.ASCIIFoldingFilterFactory 
{luceneMatchVersion=LUCENE_24}
position        1 
term text       a¡gora 
startOffset 0 
endOffset       6 

org.apache.solr.analysis.EdgeNGramFilterFactory {maxGramSize=15, side=front, 
minGramSize=2, luceneMatchVersion=LUCENE_24}
position        1       2       3       4       5 
term text       a¡      a¡g     a¡go    a¡gor a¡gora 
startOffset 0   0       0       0       0 
endOffset       2       3       4       5       6

I hope this was enough... Ask me whatever you need. Thx


> I'm using Solr 3.3 for searching in different languages,
> one of them is Spanish. The ASCIIFoldingFilterFactory works
> fine, but if word begins with a letter accented, like
> "ágora" or "ínclito", it can't find anything. I have to
> search word without accent in order to find some result. For
> instance:
> 
>  
> 
> -          Title: Imágenes del
> ágora de la plaza central.
> 
> -          Searching text:
> "imágenes" or "imagenes" returns the same result, the title
> above
> 
> -          Searching text:
> "ágora" returns no results, while "agora" returns the right
> result

That's quite strange. Your field type definition would be needed. 

and admin/analysis.jsp show step by step output of analysis.
What happens to words  "ágora" or "ínclito" at index time and query time?

RE: Problem with first letter accented

Reply via email to