Re: Swedish sentence detection model not performing properly

Jörn Kottmann Wed, 04 Jan 2012 13:49:46 -0800

Which OpenNLP version do you use?

We improved the "space" handling in the sentence detector for 1.5.2, if
you are still on 1.5.1, I suggest that you update.


Jörn

On 1/4/12 12:20 PM, Svetoslav Marinov wrote:

Hi all,

I have starting using OpenNLP with the available Swedish models. One thing I 
noticed is that the sentence detection model does not perform properly when the 
full-stop is immediately followed by a newline character and the next sentence 
start immediately after that. So the following example:
---
Hunden blir hundstjärnan, Sirius.
Artemis skyddade de gravida kvinnorna
---
Will be segmented as:

<S>
Hunden blir hundstjärnan, Sirius.Artemis
</S>
<S>
skyddade de gravida kvinnorna
</S>

I am curious if someone has experienced similar problems with Swedish or other 
languages. And any ideas why it is so?

I wonder how one can alleviate this behaviour. One way is to train a new model 
but I doubt this will help. Or? Another way is to substitute all newline 
characters with spaces. I do concatenate all lines into a single string which I 
subsequently apply the sentence detection model to. Is this the way it should 
be done (if I read the documentation correctly).

Best regards,

Svetoslav

Re: Swedish sentence detection model not performing properly

Reply via email to