Is this a typical OpenNLP tokenization issue?

Ling Wed, 28 Jun 2017 19:04:39 -0700

Hi, all:

I am testing openNLP and found some significant tokenization issue
involving punctuation.


Thank you Costco!
i love costco!
I love Costco!!
FUCK IKEA.

In all these cases, the last punctuation is not split so "Costco!" and
"IKEA." are treated as one token. This looks like a systematic problem.
Before I file an issue on OpenNLP project, I want to make sure this issue
is true coming from the library.

Does any of you encounter similar problem? Thanks.

Is this a typical OpenNLP tokenization issue?

Reply via email to