Hi there,

I was wondering, would it be possible to add a new feature  to the
indexing engine (or somehow simulate it) that will do EXACTLY opposite
of Lucy::Analysis::SnowballStopFilter? In other words, instead of
blocking a list of stopwords, indexing engine will index ONLY phrases
supplied in the user list to the exact match. Or even better, prioritize
them for indexing: index the user list first and then use Lucy analyzer
for words that are not in the list.

Why this can be useful? In chemistry for example, it is simply
impossible to create a rule that will index chemical names correctly (
e.g. NH4+/H+K+/NH4+(H+), [Hg(CN)2], Ca(.-) just to name a few of
thousands). Also, in a biomedical text some seemingly common words can
for example, represent a gene or protein name which should not be
stemmed.  To summarize, this feature will allow one to create a correct
index(es) of specialized terms.

Alex

Reply via email to