Unfortunately the current SynonymFilter cannot handle posInc != 1 ...
we could perhaps try to fix this ... patches welcome :)

So for now it's best to place SynonymFilter before StopFilter, and
before any other filters that may create graph tokens (posLen > 1,
posInc == 0).

Mike McCandless

http://blog.mikemccandless.com


On Mon, Sep 23, 2013 at 2:45 AM,  <david.dav...@correo.aeat.es> wrote:
> Hi,
>
> I am having a problem applying StopFilterFactory and
> SynonimFilterFactory. The problem is that SynonymFilter removes the gaps
> that were previously put by the StopFilterFactory. I'm applying filters in
>
> query time, because users need to change synonym lists frequently.
>
> This is my schema, and an example of the issue:
>
>
> String: "documentacion para agentes"
>
> org.apache.solr.analysis.WhitespaceTokenizerFactory
> {luceneMatchVersion=LUCENE_35}
> position        1       2       3
> term text       documentación    para   agentes
> startOffset     0       14      19
> endOffset       13      18      26
> org.apache.solr.analysis.LowerCaseFilterFactory
> {luceneMatchVersion=LUCENE_35}
> position        1       2       3
> term text       documentación    para   agentes
> startOffset     0       14      19
> endOffset       13      18      26
> org.apache.solr.analysis.StopFilterFactory {words=stopwords_intranet.txt,
> ignoreCase=true, enablePositionIncrements=true,
> luceneMatchVersion=LUCENE_35}
> position        1       3
> term text       documentación   agentes
> startOffset     0       19
> endOffset       13      26
> org.apache.solr.analysis.SynonymFilterFactory
> {synonyms=sinonimos_intranet.txt, expand=true, ignoreCase=true,
> luceneMatchVersion=LUCENE_35}
> position        1       2
> term text       documentación   agente
>         archivo         agentes
> type    SYNONYM SYNONYM
>         SYNONYM SYNONYM
> startOffset 0           19
>         0               19
> endOffset 13            26
>         13              26
>
>
> As you can see, the position should be 1 and 3, but SynonymFilter removes
> the gap and moves token from position 3 to 2
> I've got the same problem with Solr 3.5 y 4.0.
> I don't know if it's a bug or an error with my configuration. In other
> schemas that I have worked with, I had always put the SynonymFilter
> previous to StopFilter, but in this I prefered using this order because of
>
> the big number of synonym that the list has (i.e. I don't want to generate
>
> a lot of synonyms for a word that I really wanted to remove).
>
> Thanks,
>
> David Dávila Atienza
> AEAT - Departamento de Informática Tributaria
>
> David Dávila Atienza
> AEAT - Departamento de Informática Tributaria
> Subdirección de Tecnologías de Análisis de la Información e Investigación
> del Fraude
> Área de Infraestructuras
> Teléfono: 915831543
> Extensión: 31543

Reply via email to