Hi, all: I am testing openNLP and found some significant tokenization issue involving punctuation.
Thank you Costco! i love costco! I love Costco!! FUCK IKEA. In all these cases, the last punctuation is not split so "Costco!" and "IKEA." are treated as one token. This looks like a systematic problem. Before I file an issue on OpenNLP project, I want to make sure this issue is true coming from the library. Does any of you encounter similar problem? Thanks.