And also snowball: http://snowball.tartarus.org/
Jörn On Fri, Jul 7, 2017 at 9:10 AM, Rodrigo Agerri <rodrigo.age...@ehu.eus> wrote: > Hello, > > The stemmer algorithm implemented in OpenNLP is this one: > > https://tartarus.org/martin/PorterStemmer/ > > Regarding the "null" lemma, are you using OpenNLP to lemmatize? > > Rodrigo > > On Fri, Jul 7, 2017 at 5:47 AM, Ling <lingv...@gmail.com> wrote: >> I use it indirectly through another library, there is a function >> token.getLemma(). >> >> On Jul 6, 2017 7:24 PM, "John Stewart" <cane.c...@gmail.com> wrote: >> >>> I'm asking because I thought there are no pre-trained models for the >>> lemmatizer. How are you using it exactly? There's also an option to use a >>> dictionary, e.g. >>> https://stackoverflow.com/questions/38982423/opennlp-lemmatization-example >>> >>> AFAIK the models in 1.8.1 are the same as 1.5.3 >>> >>> jds >>> >>> On Thu, Jul 6, 2017 at 6:26 PM, Ling <lingv...@gmail.com> wrote: >>> >>> > The openNLP1.5.3. I will update to 1.8.1 version after this week, if it's >>> > an issue due to old models. >>> > >>> > Thanks. >>> > >>> > On Thu, Jul 6, 2017 at 3:19 PM, John Stewart <cane.c...@gmail.com> >>> wrote: >>> > >>> > > What model or dictionary are you using with the lemmatizer? >>> > > >>> > > jds >>> > > >>> > > On Thu, Jul 6, 2017 at 6:05 PM, Ling <lingv...@gmail.com> wrote: >>> > > >>> > > > Hi, the problem with lemma is that, for "tmoble", the lemma returned >>> by >>> > > > openNLP is "null", not "tmoble". >>> > > > >>> > > > Why is it? >>> > > > >>> > > > On Mon, Jul 3, 2017 at 6:54 PM, Rakesh P <rakeshbe...@gmail.com> >>> > wrote: >>> > > > >>> > > > > Hi, >>> > > > > Stemmer works based on some predefined rules. Examples for rules >>> are >>> > > > "word >>> > > > > that ends with 'e'". So, if you want to get a meaning word after >>> > > > > preprocessing, then better use lemmatization. >>> > > > > >>> > > > > Regards, >>> > > > > Rakesh P >>> > > > > >>> > > > > > On 03-Jul-2017, at 10:24 PM, Ling <marlon...@gmail.com> wrote: >>> > > > > > >>> > > > > > Hi, I noticed that some words are stemmed like the following: >>> > > > > > >>> > > > > > iphone -> iphon >>> > > > > > tmobile -> T-mobil >>> > > > > > >>> > > > > > Is there some parameter to control this behavior? In such cases, >>> > > those >>> > > > > > stems are actually harmful, making them become unknown words in >>> > text. >>> > > > > Since >>> > > > > > these are quite common, I am just curious whether there is a way >>> to >>> > > > > change >>> > > > > > the default behavior. >>> > > > > > >>> > > > > > Thanks. >>> > > > > > Ling >>> > > > > >>> > > > >>> > > >>> > >>>