Hi OpenNLP-users,

I have one question about the pretrained model for the German sentence
detector.

The documentation says:

"Usually Sentence Detection is done **before** the text is tokenized and
that's the way the pre-trained models on the web site are trained"

So how was the provided model for German exactly trained? The TIGER
corpus IS tokenized - so was the TIGER corpus detokenized for training?

Is there any documentation available so that I can reproduce the
training steps for the pretrained model?

Thanks + regards,

Stefan


Reply via email to