Dharan Althuru wrote:
Hi,
We are trying to incorporate synonym filter during indexing using Nutch. As
per my understanding Nutch doesn’t have synonym indexing plug-in by default.
Can we extend IndexFilter in Nutch to incorporate the synonym filter plug-in
available in Lucene using WordNet or custom synonym plug-in without any
negative impacts to existing Nutch indexing (i.e., considering bigram etc).
Synonym expansion should be done when the text is analyzed (using
Analyzers), so you can reuse the Lucene's synonym filter.
Unfortunately, this happens at different stages depending on whether you
use the built-in Lucene indexer, or the Solr indexer.
If you use the Lucene indexer, this happens in LuceneWriter, and the
only way to affect it is to implement an analysis plugin, so that it's
returned from AnalyzerFactory, and use your analysis plugin instead of
the default one. See e.g. analysis-fr for an example of how to implement
such plugin.
However, when you index to Solr you need to configure the Solr's
analysis chain, i.e. in your schema.xml you need to define for your
fieldType that it has the synonym filter in its indexing analysis chain.
--
Best regards,
Andrzej Bialecki
___. ___ ___ ___ _ _ __
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http://www.sigram.com Contact: info at sigram dot com