Hello, Sorry for reviving this thread again, but I have come across another question related to it.
When working with stemming and stop word lists in order to pre-process the text data, wouldn't this mean that are as many language models as there are parameter combinations? For instance, if I have boolean pre-processing parameters in my application - useStemming yes/no and useStopList yes/no - do I end up with 2^2 = 4 language models? Perhaps a naive question, but it seems that the use of such pre-processing parameters inflates the LM data that I need to manage quite a bit. Cheers, Martin Am 23.02.2014 um 15:24 schrieb Jörn Kottmann <[email protected]>: > Hello, > > the current trunk version includes the Porter and Snowball stemmers. We > didn't develop the ourself > but redistribute them as part of OpenNLP. > It would be nice to add more stemmers, in case you need a certain one it > would be nice if you could > point it out, and we might be able to redistribute it as well. Or maybe just > implement it. > > We don't have stoplists, but I think it will be easy to change that. We could > probably use the ones from snowball. > > There is no language modeling, it would be nice to get a contribution there. > Maybe you are interested in implementing it? > > Anyway, it would be nice if you could open two ira issues to request stopword > lists and the language model. > > Jörn > > On 02/23/2014 02:35 PM, Martin Wunderlich wrote: >> Hi all, >> >> I recently started working with OpenNLP for a project in the area of text >> classification with neural networks. So far, OpenNLP is a great library and >> very useful. >> There are just three things that I haven't been able to find, but maybe they >> do exist: >> - language models: e.g. to create a bigram language model with relative and >> absolute frequencies from several texts >> - stemming: to reduce different word forms in inflected languages to a >> canonical root form >> - stoplist: to remove certain words (e.g. from the language model) that are >> deemed irrelevant >> >> Do these functions exist in OpenNLP? If not, can you recommend another >> library to complement these functions? >> >> Kind regards, >> >> Martin >> >> >
signature.asc
Description: Message signed with OpenPGP using GPGMail
