Hello, 

Sorry for reviving this thread again, but I have come across another question 
related to it. 

When working with stemming and stop word lists in order to pre-process the text 
data, wouldn't this mean that are as many language models as there are 
parameter combinations? 
For instance, if I have boolean pre-processing parameters in my application - 
useStemming yes/no and useStopList yes/no - do I end up with 2^2 = 4 language 
models? Perhaps a naive question, but it seems that the use of such 
pre-processing parameters inflates the LM data that I need to manage quite a 
bit.

Cheers, 

Martin
 

Am 23.02.2014 um 15:24 schrieb Jörn Kottmann <[email protected]>:

> Hello,
> 
> the current trunk version includes the Porter and Snowball stemmers. We 
> didn't develop the ourself
> but redistribute them as part of OpenNLP.
> It would be nice to add more stemmers, in case you need a certain one it 
> would be nice if you could
> point it out, and we might be able to redistribute it as well. Or maybe just 
> implement it.
> 
> We don't have stoplists, but I think it will be easy to change that. We could 
> probably use the ones from snowball.
> 
> There is no language modeling, it would be nice to get a contribution there. 
> Maybe you are interested in implementing it?
> 
> Anyway, it would be nice if you could open two ira issues to request stopword 
> lists and the language model.
> 
> Jörn
> 
> On 02/23/2014 02:35 PM, Martin Wunderlich wrote:
>> Hi all,
>> 
>> I recently started working with OpenNLP for a project in the area of text 
>> classification with neural networks. So far, OpenNLP is a great library and 
>> very useful.
>> There are just three things that I haven't been able to find, but maybe they 
>> do exist:
>> - language models: e.g. to create a bigram language model with relative and 
>> absolute frequencies from several texts
>> - stemming: to reduce different word forms in inflected languages to a 
>> canonical root form
>> - stoplist: to remove certain words (e.g. from the language model) that are 
>> deemed irrelevant
>> 
>> Do these functions exist in OpenNLP? If not, can you recommend another 
>> library to complement these functions?
>> 
>> Kind regards,
>> 
>> Martin
>> 
>> 
> 

Attachment: signature.asc
Description: Message signed with OpenPGP using GPGMail

Reply via email to