Hi, all:

I am testing openNLP and found some significant tokenization issue
involving punctuation.

Thank you Costco!
i love costco!
I love Costco!!
FUCK IKEA.

In all these cases, the last punctuation is not split so "Costco!" and
"IKEA." are treated as one token. This looks like a systematic problem.
Before I file an issue on OpenNLP project, I want to make sure this issue
is true coming from the library.

Does any of you encounter similar problem? Thanks.

Reply via email to