Re: Stemming in openNLP

Joern Kottmann Fri, 07 Jul 2017 01:19:03 -0700

And also snowball:
http://snowball.tartarus.org/


Jörn

On Fri, Jul 7, 2017 at 9:10 AM, Rodrigo Agerri <rodrigo.age...@ehu.eus> wrote:
> Hello,
>
> The stemmer algorithm implemented in OpenNLP is this one:
>
> https://tartarus.org/martin/PorterStemmer/
>
> Regarding the "null" lemma, are you using OpenNLP to lemmatize?
>
> Rodrigo
>
> On Fri, Jul 7, 2017 at 5:47 AM, Ling <lingv...@gmail.com> wrote:
>> I use it indirectly through another library, there is a function
>> token.getLemma().
>>
>> On Jul 6, 2017 7:24 PM, "John Stewart" <cane.c...@gmail.com> wrote:
>>
>>> I'm asking because I thought there are no pre-trained models for the
>>> lemmatizer. How are you using it exactly?  There's also an option to use a
>>> dictionary, e.g.
>>> https://stackoverflow.com/questions/38982423/opennlp-lemmatization-example
>>>
>>> AFAIK the models in 1.8.1 are the same as 1.5.3
>>>
>>> jds
>>>
>>> On Thu, Jul 6, 2017 at 6:26 PM, Ling <lingv...@gmail.com> wrote:
>>>
>>> > The openNLP1.5.3. I will update to 1.8.1 version after this week, if it's
>>> > an issue due to old models.
>>> >
>>> > Thanks.
>>> >
>>> > On Thu, Jul 6, 2017 at 3:19 PM, John Stewart <cane.c...@gmail.com>
>>> wrote:
>>> >
>>> > > What model or dictionary are you using with the lemmatizer?
>>> > >
>>> > > jds
>>> > >
>>> > > On Thu, Jul 6, 2017 at 6:05 PM, Ling <lingv...@gmail.com> wrote:
>>> > >
>>> > > > Hi, the problem with lemma is that, for "tmoble", the lemma returned
>>> by
>>> > > > openNLP is "null", not "tmoble".
>>> > > >
>>> > > > Why is it?
>>> > > >
>>> > > > On Mon, Jul 3, 2017 at 6:54 PM, Rakesh P <rakeshbe...@gmail.com>
>>> > wrote:
>>> > > >
>>> > > > > Hi,
>>> > > > > Stemmer works based on some predefined rules. Examples for rules
>>> are
>>> > > > "word
>>> > > > > that ends with 'e'". So, if you want to get a meaning word after
>>> > > > > preprocessing, then better use lemmatization.
>>> > > > >
>>> > > > > Regards,
>>> > > > > Rakesh P
>>> > > > >
>>> > > > > > On 03-Jul-2017, at 10:24 PM, Ling <marlon...@gmail.com> wrote:
>>> > > > > >
>>> > > > > > Hi, I noticed that some words are stemmed like the following:
>>> > > > > >
>>> > > > > > iphone ->  iphon
>>> > > > > > tmobile -> T-mobil
>>> > > > > >
>>> > > > > > Is there some parameter to control this behavior? In such cases,
>>> > > those
>>> > > > > > stems are actually harmful, making them become unknown words in
>>> > text.
>>> > > > > Since
>>> > > > > > these are quite common, I am just curious whether there is a way
>>> to
>>> > > > > change
>>> > > > > > the default behavior.
>>> > > > > >
>>> > > > > > Thanks.
>>> > > > > > Ling
>>> > > > >
>>> > > >
>>> > >
>>> >
>>>

Re: Stemming in openNLP

Reply via email to