I'm sorry if this mail is repeated. But my server mail gave me an error. Hi!
I've changed the server.xml to add the URI Enconding. I've changed the schema version to 1.4. And I've reindexed my DB. But nothing has changed. In the analisys.jsp I've searched for "más", in order to find what happens with that word, and it's also recognized as two characters, just like "ágora". But it works for "más". The order of filter application may be relevant?? I don't read anything about it, but... -----Mensaje original----- De: Ahmet Arslan [mailto:iori...@yahoo.com] Enviado el: viernes, 08 de julio de 2011 9:57 Para: solr-user@lucene.apache.org Asunto: RE: Problem with first letter accented Hello, As I see from analyis.jsp your á letter is not converted to 'a' by ASCII folding filter. It is recognized as two characters 'á' (before it comes to ASCII folding) for some reason. First of all I would check URI Encoding of my servlet container. It should be utf-8. See tomcat's config: http://wiki.apache.org/solr/SolrTomcat#URI_Charset_Config Not related to this issue but I recommend you to use 1.4 as schema version. <schema name="ca_objects" version="1.4"> --- On Fri, 7/8/11, Villacorta Peral, Eva <e.villaco...@ibermatica.com> wrote: > From: Villacorta Peral, Eva <e.villaco...@ibermatica.com> > Subject: RE: Problem with first letter accented > To: solr-user@lucene.apache.org > Date: Friday, July 8, 2011, 10:26 AM > I'm using collectiveaccess, and its > DB structure. Perhaps this is useful... > > My type definition is: > > <schema name="ca_objects" version="1.1"> > <types> > <fieldType > name="text" class="solr.TextField" > positionIncrementGap="100"> > > <analyzer> > > <tokenizer > class="solr.WhitespaceTokenizerFactory"/> > > <filter > class="solr.LowerCaseFilterFactory"/> > > <filter > class="solr.ASCIIFoldingFilterFactory"/> > > <filter > class="solr.EdgeNGramFilterFactory" minGramSize="2" > maxGramSize="15" side="front"/> > > </analyzer> > </fieldType> > <fieldType > name="string" class="solr.StrField" /> > <fieldtype > name="ignored" stored="false" indexed="false" > class="solr.StrField" /> > </types> > > And the analisys of "ágora" is: > > Index Analyzer > org.apache.solr.analysis.WhitespaceTokenizerFactory > {luceneMatchVersion=LUCENE_24} > position 1 > term text ágora > startOffset 0 > endOffset 6 > > org.apache.solr.analysis.LowerCaseFilterFactory > {luceneMatchVersion=LUCENE_24} > position 1 > term text ã¡gora > startOffset 0 > endOffset 6 > > org.apache.solr.analysis.ASCIIFoldingFilterFactory > {luceneMatchVersion=LUCENE_24} > position 1 > term text a¡gora > startOffset 0 > endOffset 6 > > org.apache.solr.analysis.EdgeNGramFilterFactory > {maxGramSize=15, side=front, minGramSize=2, > luceneMatchVersion=LUCENE_24} > position 1 2 > 3 4 > 5 > term text a¡ a¡g > a¡go a¡gor > a¡gora > startOffset 0 0 0 > 0 0 > > endOffset 2 > 3 4 5 > 6 > > Query Analyzer > org.apache.solr.analysis.WhitespaceTokenizerFactory > {luceneMatchVersion=LUCENE_24} > position 1 > term text ágora > startOffset 0 > endOffset 6 > > org.apache.solr.analysis.LowerCaseFilterFactory > {luceneMatchVersion=LUCENE_24} > position 1 > term text ã¡gora > startOffset 0 > endOffset 6 > > org.apache.solr.analysis.ASCIIFoldingFilterFactory > {luceneMatchVersion=LUCENE_24} > position 1 > term text a¡gora > startOffset 0 > endOffset 6 > > org.apache.solr.analysis.EdgeNGramFilterFactory > {maxGramSize=15, side=front, minGramSize=2, > luceneMatchVersion=LUCENE_24} > position 1 2 > 3 4 > 5 > term text a¡ a¡g > a¡go a¡gor a¡gora > startOffset 0 0 0 > 0 0 > endOffset 2 3 > 4 5 > 6 > > I hope this was enough... Ask me whatever you need. Thx > > > > I'm using Solr 3.3 for searching in different > languages, > > one of them is Spanish. The ASCIIFoldingFilterFactory > works > > fine, but if word begins with a letter accented, like > > "ágora" or "ínclito", it can't find anything. I have > to > > search word without accent in order to find some > result. For > > instance: > > > > > > > > - Title: Imágenes del > > ágora de la plaza central. > > > > - Searching text: > > "imágenes" or "imagenes" returns the same result, the > title > > above > > > > - Searching text: > > "ágora" returns no results, while "agora" returns the > right > > result > > That's quite strange. Your field type definition would be > needed. > > and admin/analysis.jsp show step by step output of > analysis. > What happens to words "ágora" or "ínclito" at index > time and query time? >