Hi all,

I have starting using OpenNLP with the available Swedish models. One thing I 
noticed is that the sentence detection model does not perform properly when the 
full-stop is immediately followed by a newline character and the next sentence 
start immediately after that. So the following example:
---
Hunden blir hundstjärnan, Sirius.
Artemis skyddade de gravida kvinnorna
---
Will be segmented as:

<S>
Hunden blir hundstjärnan, Sirius.Artemis
</S>
<S>
skyddade de gravida kvinnorna
</S>

I am curious if someone has experienced similar problems with Swedish or other 
languages. And any ideas why it is so?

I wonder how one can alleviate this behaviour. One way is to train a new model 
but I doubt this will help. Or? Another way is to substitute all newline 
characters with spaces. I do concatenate all lines into a single string which I 
subsequently apply the sentence detection model to. Is this the way it should 
be done (if I read the documentation correctly).

Best regards,

Svetoslav

Reply via email to