Hello, As I see from analyis.jsp your á letter is not converted to 'a' by ASCII folding filter. It is recognized as two characters 'á' (before it comes to ASCII folding) for some reason.
First of all I would check URI Encoding of my servlet container. It should be utf-8. See tomcat's config: http://wiki.apache.org/solr/SolrTomcat#URI_Charset_Config Not related to this issue but I recommend you to use 1.4 as schema version. <schema name="ca_objects" version="1.4"> --- On Fri, 7/8/11, Villacorta Peral, Eva <e.villaco...@ibermatica.com> wrote: > From: Villacorta Peral, Eva <e.villaco...@ibermatica.com> > Subject: RE: Problem with first letter accented > To: solr-user@lucene.apache.org > Date: Friday, July 8, 2011, 10:26 AM > I'm using collectiveaccess, and its > DB structure. Perhaps this is useful... > > My type definition is: > > <schema name="ca_objects" version="1.1"> > <types> > <fieldType > name="text" class="solr.TextField" > positionIncrementGap="100"> > > <analyzer> > > <tokenizer > class="solr.WhitespaceTokenizerFactory"/> > > <filter > class="solr.LowerCaseFilterFactory"/> > > <filter > class="solr.ASCIIFoldingFilterFactory"/> > > <filter > class="solr.EdgeNGramFilterFactory" minGramSize="2" > maxGramSize="15" side="front"/> > > </analyzer> > </fieldType> > <fieldType > name="string" class="solr.StrField" /> > <fieldtype > name="ignored" stored="false" indexed="false" > class="solr.StrField" /> > </types> > > And the analisys of "ágora" is: > > Index Analyzer > org.apache.solr.analysis.WhitespaceTokenizerFactory > {luceneMatchVersion=LUCENE_24} > position 1 > term text ágora > startOffset 0 > endOffset 6 > > org.apache.solr.analysis.LowerCaseFilterFactory > {luceneMatchVersion=LUCENE_24} > position 1 > term text ã¡gora > startOffset 0 > endOffset 6 > > org.apache.solr.analysis.ASCIIFoldingFilterFactory > {luceneMatchVersion=LUCENE_24} > position 1 > term text a¡gora > startOffset 0 > endOffset 6 > > org.apache.solr.analysis.EdgeNGramFilterFactory > {maxGramSize=15, side=front, minGramSize=2, > luceneMatchVersion=LUCENE_24} > position 1 2 > 3 4 > 5 > term text a¡ a¡g > a¡go a¡gor > a¡gora > startOffset 0 0 0 > 0 0 > > endOffset 2 > 3 4 5 > 6 > > Query Analyzer > org.apache.solr.analysis.WhitespaceTokenizerFactory > {luceneMatchVersion=LUCENE_24} > position 1 > term text ágora > startOffset 0 > endOffset 6 > > org.apache.solr.analysis.LowerCaseFilterFactory > {luceneMatchVersion=LUCENE_24} > position 1 > term text ã¡gora > startOffset 0 > endOffset 6 > > org.apache.solr.analysis.ASCIIFoldingFilterFactory > {luceneMatchVersion=LUCENE_24} > position 1 > term text a¡gora > startOffset 0 > endOffset 6 > > org.apache.solr.analysis.EdgeNGramFilterFactory > {maxGramSize=15, side=front, minGramSize=2, > luceneMatchVersion=LUCENE_24} > position 1 2 > 3 4 > 5 > term text a¡ a¡g > a¡go a¡gor a¡gora > startOffset 0 0 0 > 0 0 > endOffset 2 3 > 4 5 > 6 > > I hope this was enough... Ask me whatever you need. Thx > > > > I'm using Solr 3.3 for searching in different > languages, > > one of them is Spanish. The ASCIIFoldingFilterFactory > works > > fine, but if word begins with a letter accented, like > > "ágora" or "ínclito", it can't find anything. I have > to > > search word without accent in order to find some > result. For > > instance: > > > > > > > > - Title: Imágenes del > > ágora de la plaza central. > > > > - Searching text: > > "imágenes" or "imagenes" returns the same result, the > title > > above > > > > - Searching text: > > "ágora" returns no results, while "agora" returns the > right > > result > > That's quite strange. Your field type definition would be > needed. > > and admin/analysis.jsp show step by step output of > analysis. > What happens to words "ágora" or "ínclito" at index > time and query time? >