KStem custom lexicons configuration possible?

2011-06-20 Thread Lukáš Vlček
Hi, Is there any API in KStem filter for lexicons configuration? As far as I understand the original code works in such a way that lexicons are loaded from files at startup (see http://lexicalresearch.com/kstem-doc.txt). The author (Robert Krovetz) names possibility to modify lexicons among

Re: KStem custom lexicons configuration possible?

2011-06-20 Thread Lukáš Vlček
May be I should show some examples where I think custom configuration can be useful. Let me give you two examples: 1) As of now, KStem does conflation of both words connector and connected to the same term connect. 2) Contrary it does not do conflation of transaction and transactions to the same

Re: KStem custom lexicons configuration possible?

2011-06-20 Thread Robert Muir
On Mon, Jun 20, 2011 at 7:19 AM, Lukáš Vlček lukas.vl...@gmail.com wrote: Having an option to modify internal lexicons I would be able to adapt the KStem to work better for specific text corpora. What do you think? please use StemmerOverrideFilter for this! it works with all stemmers,

Re: KStem custom lexicons configuration possible?

2011-06-20 Thread Lukáš Vlček
Hi Robert, this sounds interesting I will look at it in more detail. However, I do not think this is really a general solution. If I understand StemmerOverrideFilter correctly (from a quick glance) it rely on the fact that you *know* exact term (the key in the map) in advance. In other words if

Re: KStem custom lexicons configuration possible?

2011-06-20 Thread Robert Muir
On Mon, Jun 20, 2011 at 8:23 AM, Lukáš Vlček lukas.vl...@gmail.com wrote: Hi Robert, this sounds interesting I will look at it in more detail. However, I do not think this is really a general solution. If I understand StemmerOverrideFilter correctly (from a quick glance) it rely on the fact

Re: KStem custom lexicons configuration possible?

2011-06-20 Thread Lukáš Vlček
Hi Robert, I think the difference between KStem and other stemmers (at least those that I am aware of, like snowball or porter) is that KStem is expected to produce a real valid words and thus other filtering can be applied to the tokens after stemming more easily (for example synonym expansion).