Basically you need to know the punctuation signs indicating end of sentence or find someone who does...then use regex to split the sentences at those signs! it's not gonna be perfect - you may have to pass it once or twice with your own eyes to make sure everything is ok before training. everything depends on the language and how ambiguous punctuation it has.

Jim

On 20/03/12 18:38, Jairo Sarabia wrote:
Hi all,

I see there aren't Sentence Detect Models for Asian languages in openNLP
repository and I need these ones.
I've to train Sentence Detect Models for Chinese, Japanese and Korean
languages, but I don't know these languages.
How coud I get the data train files for these languages?

Thanks in advance!,

Jairo Sarabia


Reply via email to