Re: Question about PatternReplace filter and automatic Synonym generation

Christian Zambrano Mon, 05 Oct 2009 21:01:11 -0700

Prasanna,

Wouldn't it be better to use built-in token filters at both index andquery that will convert 'it!' to just 'it'? I believe theWorkDelimeterFilterFactory will do that for you.


Christian

On Oct 5, 2009, at 7:31 PM, Prasanna Ranganathan <pranganat...@netflix.com> wrote:

On 10/5/09 2:46 AM, "Shalin Shekhar Mangar" <shalinman...@gmail.com>wrote:
Alternatively, is there a filter available which takes in apattern andproduces additional forms of the token depending on the pattern?The use
case I am looking at here is using such a filter to automate synonym
generation. In our application, quite a few of the synonym fileentriesmatch a specific pattern and having such a filter would make iteasier Ibelieve. Pl. do correct me in case I am missing some unwanted side-effect
with this approach.
I do not understand this. TokenFilters are used for things likestemming,replacing patterns, lowercasing, n-gramming etc. The synonym filterinserts
additional tokens (synonyms) from a file for each token.

What exactly are you trying to do with synonyms? I guess you could do
stemming etc with synonyms but why do you want to do that?
I ll try to explain with an example. Given the term 'it!' in thetitle, itshould match both 'it' and 'it!' in the query as an exact match.Currently,this is done by using a synonym entry (and index timeSynonymFilter) as
follows:

it! => it, it!
Now, the above holds true for all cases where you have a title tokenof the
form [aA-zZ]*!. Handling all of those cases requires adding synonyms
manually for each case which is not easy to manage and does not scale.
I am hoping to do the same by using a index time filter that takesin apattern like the PatternReplace filter and adds the newly createdtokeninstead of replacing the original one. Does this make sense? Am Imissing
something that would break this approach?
Note that a change in synonym file needs a re-index of the affected
documents. Also, the synonym map is kept in memory.
What is the overhead incurred in having an additional filter appliedduring
indexing? It is strictly CPU only?

Thanks a lot for your valuable input.

Regards,

Prasanna.

Re: Question about PatternReplace filter and automatic Synonym generation

Reply via email to