Basically you need to know the punctuation signs indicating end of
sentence or find someone who does...then use regex to split the
sentences at those signs! it's not gonna be perfect - you may have to
pass it once or twice with your own eyes to make sure everything is ok
before training. everything depends on the language and how ambiguous
punctuation it has.
Jim
On 20/03/12 18:38, Jairo Sarabia wrote:
Hi all,
I see there aren't Sentence Detect Models for Asian languages in openNLP
repository and I need these ones.
I've to train Sentence Detect Models for Chinese, Japanese and Korean
languages, but I don't know these languages.
How coud I get the data train files for these languages?
Thanks in advance!,
Jairo Sarabia